Wednesday, April 13, 2011

Decomposing a song into chords, Part 1

This is the first of a multi-part series on how to write a program for decomposing a song that analyzes the content of a song into notes and chords.  In this part, an algorthim is developed to measure the frequency and amplitude components in a wideband audio input and determine the musical notes associated with all detected frequencies.  The algorithm is executed against a simulated input of four notes, constructed as two 2-note chords.  In part 2, the algorithm will be tweaked to execute against real audio input from an electric guitar.

This part utilizes SciLab, an excellent open-source (free) alternative to Matlab with similar syntax and funtionality.

We begin by defining the four input signals as sine waves at the specified frequencies, amplitudes, and sampling rates.  Simulated noise is added to give it a bit of realism.  Equation 1 defines a single input signal.
(1) x = 40*sin(2*pi*f1/Fs*n) + noise1;

where f1 is the frequency, Fs is the sampling rate, n is a vector of 1:length(n), and noise1 is random noise about a normal distribution.

The variable x now contains the full "song".  This song is processed by taking N samples at a time  (from the time domain) and computing the fast fourier transform (FFT) of length N on the samples to produce the variable y in the frequency domain.  A hamming window is applied prior to the FFT to widen the main lobe of the filter.  The samples are then shifted by Fs/Fso where Fso is the desired output rate of the FFT.
(2) for ii=1:(Fs/Fso):(length(x)-N*(ppf*M+1))
xx = x(ii:(ii+N));
nn = window('hm',length(xx)) .* xx;
y_temp = fft(nn);

Next the output of the FFT (y_temp) is cut in half at the Nyquist frequency to remove the upper image.  The result is then decimated by a factor of M (if desired) to improve processing speed.  The result is stored in the matrix y for this instant in time.
(3) y_temp = y_temp(1:N/2);
y_temp = y_temp(1:M:length(y_temp));
y(idx,:) = y_temp;

The frequency data (y_temp) at this time point is examined to determine the peak amplitude and record any frequency bins that exceed a specified threshold below the peak amplitude.  This stored away and used later to determine the notes present in the song at this instant in time.
(4) nidx = find(dbphi(abs(y_temp)) > thresh);
c(idx,1:length(nidx)) = nidx;

where thresh is an arbitrary threshold and c is the output matrix used to store the frequency bins containing notes.

After the completion of the loop, the frequency bins containing notes (the matrix c) are analyzed to determine what notes are present at each instant in time.   First, the frequency of each detection bin must be determined.  Then, the note corresponding to that frequency is computed by dividing the bin frequency by the base frequencies (a vector) of A through G (i.e. the lowest frequency of each note).  The result is a vector of multiples, corresponding to how well each base note fits with the detected frequency.  The correct note will be the one closest to a power of two.
(5) for ii=1:size(c,1)
for jj=1:size(c,2)
note_freq = f(c(ii,jj));
note_base = ones(12,1)*note_freq ./ notes;
note_base = abs(log2(note_base) - round(log2(note_base)));
note_base_idx = find(note_base == min(note_base));
note_intervals(ii,jj) = note_base_idx;

where ii is the index to the time domain, jj is the index to the frequency domain at each instant in time, f is a vector converting bin number to frequency, and notes is a vector of the base note frequencies (A through G).  The result (note_intervals) is an index into the notes vector corresponding to the correct note.

All of the notes at a given instant in time are collected and compared with the previous instant in time to detect chord changes.  Each time a change is detected, the detected notes are printed to the screen.  The following is a screen shot using C G A and D as input signals, taken two at time, with a switch in the middle of the song.
 G  G  C  C
G  G  A  A  C  C  C
G  G  A  A  A  A# B  C  C  C
D  G  G  G  A  A  A  A# A# B  C  C  C
D  D  G  A  A  A  A# A# B  C  C  C
D  D  A  A  A  A# B  C  C
D  D  A  A  A  A#
D  D  A  A

Note the confusion in the middle during the chord change.  The following is a screenshot of the FFT output, depicting the four notes, two at time, with the confusion during the chord change. In the next step of this program development, we will determine what chords are present at each instant in time by looking at the note intervals on the scale.  We will also remove the confusion caused by chord changes.  The program will then be executed against a real guitar audio input file.

No comments:

Post a Comment