For starters, Worf listens to the device's microphone and plots a real-time frequency spectrum. This alone is not particularly impressive given the number of free spectrum analyzer's already available for Android. But it was a necessary first step nonetheless to get familiar with audio capture. There are a number of methods and pre-built libraries floating around for plotting data and whatnot, but I chose to write my own plot routines using the raw Android SDK methods. It really wasn't that hard and probably easier than trying to learn someone else's library. I went through a number of different FFT algorithms in both Java and C before settling on a C++ version I really like. It seems most modern hardware is powerful enough to run any of them in real time, so it's just a matter of personal preference. I wanted to get familiar with running native C and C++ through JNI anyway, so this was a good excuse. After performing the FFT and plotting the frequency domain, the data is carted off to a LAME MP3 converter algorithm to render an audio output file.
Next I added a threshold that the user can draw with their finger over top of the audio spectrum. Amplitude spikes that rise above the threshold anywhere in the spectrum are flagged. Only the FFT buffers that contain flagged data are actually written to file, the rest are discarded. After playing around a bit I decided the optimal approach was to capture four sets of data at a time for FFT processing, perform the threshold check on the first and last sets, and then write all four to file if either of the two crossed the threshold. This provides a little bit of a buffer around the captured audio and helps to prevent things from getting cut off.
Finally, I added a filter that is also drawn in a manner similar to the threshold, but is much much more complicated. Initially I made the naive mistake of thinking I could filter in the frequency domain by attenuating the data according to the shape of the curve that the user had drawn. Then an inverse-FFT would yield the filtered time domain data for recording to file. Well that's not really a good idea. It sort of works when you get complicated with overlapping FFT's, but the main issue is that the spectrum plot is actually a magnitude, and not the full complex representation of the signal. Without the imaginary parts it seems the signal quality after attenuation is not very good.
The better method I eventually settled on was to use time-domain filtering. I chose a Kaiser-Bessel window function (I suppose a variety of others would work as well) to create a band-pass FIR filter. Then I string together a whole set of FIR filters in parallel centered on each point of the FFT and attenuating the signal by the amount drawn by the user in the filter curve. Turns out you just need to sum the FIR coefficients, so it's pretty easy and not CPU intensive. This seems to perform pretty well and doesn't degrade the audio quality (very much).
Depending on the level of interest in this app and these implementation notes, I may post some of the specific algorithms used along with more details in a series of follow up posts.
I am releasing Worf as a free app in the Google Play store later this week. I just want to do a little more testing with it first to see if there are any hidden bugs remaining. The link below should work once it is released.
|Download Worf from Google Play|