Thursday, April 14, 2011

Decomposing a song into chords, Part 2

In the first part of this series, we showed how to mathematically decompose the contents of a song into frequency and amplitude components, and how to identify the notes corresponding to the detected frequencies above a certain amplitude threshold.  The algorithm was run against a simulated input for four notes, taken as two 2-note chords.

Now we run the algorithm against a real audio file containing an electric guitar playing the following chord progression: G - C - D - C  - G with three bridges containing a different chord progression of F - C - F - C.  The following is a SciLab output using the mapsound function illustrating the audio input file.

The bridge chords (F - C - F - C) can be seen at times of roughly 35, 65, and 90 seconds.  The audio is was recorded at 44.1 kHz and down-sampled to 6 kHz.  The audio input file can be found here: audio_input.  Once loaded into SciLab, the audio is further down-sampled to 1 kHz to produce a spectrum with a bandwidth of 500 Hz.
`(1) [x Fs bits] = auread("test_audio");    dec_factor = Fs/1000;    x = x(1:round(dec_factor):length(x));    Fs = Fs/dec_factor;`

The input signal is then run through the algorithm as was done in previous step.  The note computation at the end has been modified to address the confusion introduced during chord changes.  The FFT sampling rate is presumed to be fast enough that at least several consecutive data sets will contain the same notes of the chord at each point in the chord progression.  The changes in chords then appear as transients and can be filtered out.  The data set can be further reduced by filtering out duplicates that appear at consecutive time samples during longer duration notes and chords.
`(2) for ii=2:(size(note_intervals,1)-1)        if ~and(unique(note_intervals(ii,:)) == unique(note_intervals(ii-1,:))) & ...           ~and(unique(note_intervals(ii,:)) == unique(note_intervals(ii+1,:))) then              note_intervals(ii,:) = 0;              continue;        end        if ~and(unique(note_intervals(ii-1,:)) == unique(note_intervals(ii,:))) then              idx(length(idx)+1) = ii;        end`

where note_intervals was the output variable computed in part 1, and ii is the index over the time domain.  Finally, adjacent frequency bin detections have been observed to cause half step errors in the note computation.  They must be merged with the main detection bin as follows:
`(3) bin_diff = diff(c(ii,:));    for jj=1:length(bin_diff)        if bin_diff(jj) == 1 then            c(ii,jj+1) = mean([c(ii,jj+1),c(ii,jj)]);        end    end    c(ii,:) = round(c(ii,:));`

Executing the algorithm with the preceding changes results in a fairly accurate reconstruction of the notes in the audio file input.  The following is an excerpt from the SciLab output.
`Time =  18.9, G  G  B  B  G  G Time =  22.3, E  E  E  G Time =  23.0, D  D  G  G  G Time =  24.8, E  E  G  G  G Time =  25.6, E  E  D  D  D  E  E  A  A  A  A  A Time =  26.6, E  E  A  A  D  D  D  D  D  E  E  A  A  A  D  D  D  G# G# A  A Time =  27.4, E  E  E  G  G  G Time =  28.4, G  G  D  D  G  G  G Time =  29.2, G  B  G  G  G Time =  32.0, A  D  D  D  A  A  A Time =  32.8, E  E  E  E  E  G  G  G  G Time =  34.0, A  A  E  F  F  F  A  A  A  A Time =  35.3, E  E  E  G  G Time =  36.9, A  A  F  F  F  A  A  A  A Time =  37.9, E  C  E  E  E  G  G  G Time =  38.9, E  F  F  F  A  A Time =  39.4, F  F  F  F  A  A  A  A Time =  40.4, E  E  C  E  E  E  G  G  G Time =  41.7, G  G  B  G  G Time =  42.2, G  G  G  B  B  D  G  G  G Time =  43.0, E  E  E  E  G  G  G Time =  44.0, E  E  A  D  D  E  E  A  A  A  D  D Time =  46.6, G  G  G  D  D  G  G  G Time =  47.4, G  G  G  B  G  G`

Note the detection of the new progression at 34 seconds, corresponding to the F - C - F - C chord progression.  This output represents the highest amplitude notes detected at each point in time.  The next step is to determine the intervals of the notes at each instant in time to reconstruct the parent chord that these notes belong to.