This week I’ve continued working on the application part of my Android application, but I’ve also started to “dig deeper” (apprenticeship pattern blog post coming soon) and learn a bit more about audio processing.
I started with general concepts of feature extraction a couple of months ago now. Understanding how to use Python libraries to extract features is simple enough, but actually understanding how they work and why they work is another story. This week’s research has revealed an underlying elegance to the concept of signal processing and helped me reach a higher level of understanding and excitement for this project.
The single best explanation has been a video by Youtuber 3Blue1Brown on Fourier Series. I recommend all of his channel because he has an elegant way of describing and visualizing every topic he speaks about. He helped me understand the beauty of Calculus, and in my process of digging deeper I wound up watching his video on the uncertainty principle which was surprisingly relevant to signal processing. Understanding the specifics of the math behind signals and waves, and knowing the fact that mathematical equations are a language used to describe straightforward physical phenomena is key. This knowledge makes daunting concepts easier to break down. Seeing the same concepts used in different contexts also helps solidify them in your mind. And if you’re implementing this in code, it will make it much easier to remember the necessary logical steps required to extract a feature.
This entire tangent (and however useful, it was an unexpected tangent) started with trying to better understand the types of feature extraction that are used in speech recognition. By the way, you know you’re digging deeper when an article with an estimated read time of 11 minutes takes you a few hours to get through with all the additional research.
And universally, as far as I can tell, the first step in signal processing and feature extraction is the Fourier transform, which is simply turning a raw audio signal into separate sine and cosine signals. I say simply, but as the 3Blue1Brown video states, this seems a bit like figuring out which colors make up a mixed up can of paint. It turns out, however, that clever math makes it quite obvious which summation of sine and cosine signals make up a complex signal. I encourage you to watch the video to understand why.
The summation of cosine and sine waves is considered the frequency domain, while the original signal is in the time domain. From the resulting frequency domain, the individual signals can be normalized by taking the log magnitude of the signals and performing an inverse Fourier transform.
This is a new concept called a cepstrum, and it is one of many possible transformations you can make on a signal to begin to analyze the data. Its usefulness comes from the ability to see changes in individual waves. Additional operations can be performed to reveal new insights into patterns in a signal. Determining which if these works best is part of the process.
These individual transformations would be very interesting to implement in code. I may not get a chance to do so for this project, but the understanding of the underlying operations will help in using existing libraries.