This week as part of my independent study I worked on feature extraction in Python. I have been using Python for Engineers as a reference, and it describes basic digital signal processing. As an exercise, I expanded on the code found in that chapter.
It’s a non-trivial task to create a sine wave in code (although compared to the most complex aspects of DSP, it’s a cakewalk). A sine wave will create the purest tone possible, as it creates a constant oscillation. This oscillation is what you perceive as a pitch. The equation for a sine wave is given as:
y(t) = A sin(2πft + φ)
where A is amplitude, f is the frequency, t is time, and φ is the phase in radians. We can ignore phase because this simply indicates where the wave starts at t=0, and this doesn’t matter to our ear. We hear the same oscillation regardless of where it starts.
Take a look a the code for creating a sine wave. Some of the details aren’t as important, but you can see the book for a description. The line that actually creates the sine wave values is:
sine_wave = [np.sin(2 * np.pi * frequency * x/sampling_rate) for x in range(num_samples)]
If you aren’t familiar with list comprehension in Python, this is just using the sine wave equation above, substituting time, t, as a specified number of samples divided by the sampling rate. The result is a list of values representing a sine wave. In reality, this is all an audio file is, with some additional encoding (and usually more interesting oscillations than a sine wave).
So what if we wanted to do this for a chord of multiple sine waves? Maybe using more list comprehension? Sure.
# Note Frequencies a4 = 440 c5 = 523.25 e5 = 659.25 chord = [a4, c5, e5] sine_waves = [[np.sin(2 * np.pi * freq * x/sampling_rate) for x in range(num_samples)] for freq in chord]
This is doing the same as above, only it’s doing it for each frequency in a list of frequencies. But that’s the easy part. The original code just multiplies the samples by the amplitude, then converts them to hexadecimal values and writes it to the file:
for s in sine_wave: wav_file.writeframes(struct.pack('h', int(s*amplitude)))
But we can’t simply do that for each sine wave in succession, or we’d get different sine waves playing one after another. That’s an arpeggio, not a chord!
So we have to get a little creative. But not too creative. If you think of each sine wave playing from a separate speaker, what you hear is the sum of the air pressure from each speaker. A single speaker is the same story: it’s just playing the sum of the three sine waves. So then, iterating through each index, we can add the amplitudes of each individual sine wave. I also had to reduce it enough to store the value as a short int, by dividing by two.
# Only write samples to the end of the shortest sine wave shortest_sample_len = min([len(j) for j in sine_waves]) for i in range(shortest_sample_len): current_frequencies = [wave[i] for wave in sine_waves] value = sum(current_frequencies) / 2 wav_file.writeframes(struct.pack('h', int(value*amplitude)))
Python for Engineers goes on to describe how to use a Fast Fourier Transform to get the frequencies from the wave file with a single sine wave. But the code works just as well for the sine wave chord! This is because the FFT is an array that treats each index as a frequency, and the value at that index is the frequency’s amplitude. This means that regardless of the number of tones in a sample, the FFT can be plotted and will reveal outliers: those indices with a much higher amplitude. These are the frequencies in the audio file.
One last thing to keep in mind is that since the indices are used to represent the frequencies, they will be whole numbers. For example, c5 = 523.25hz will show up as a spike at indices 523 and 524, which 523 having the larger value of the two.
Full code for creating a chord in Python is posted here.