How to find Pitches in Music

Have you ever wanted to find what chords were in a song? A good first step is to read the notes played during a short time interval. If several notes are played simultaneously, in order to figure out the notes, we need to separate the waveform into the individual frequencies. Fortunately there is a well-established mathematical process for doing this. Music stored in a WAV file is a good starting point, as it is a relatively simple audio format. WAV files store audio as the y-value, or loudness, of points on a sine wave.

The Fourier transform is a mathematical process designed to separate a mathematical function into its component frequencies. It works on a wide range of functions, including polynomial functions (e.g. parabolas). If you applied the Fourier Transform to a polynomial, the result would be an infinite series of waves added together – a bit of a mind bender. Interestingly, you can go the other direction using the Taylor series, which represents a wave as an infinite series of summed polynomials.

In mathematical theory, these operations are done on an infinitely large plane with continuous lines, which is impractical in real engineering exercises. The Fast Fourier Transform is a variation which operates on a finite list of points along a graph, like that in a WAV file. Since there are potentially infinite frequencies in a sound sample, the output is placed into buckets based on the loudness of each pitch (10-20hz, 20-30hz, 30-40hz, etc). The size and number of these buckets are controlled by the end application and sound quality.

There is an open source project to read wave files, which I discussed previously here. This library will identify the sample rate of the file – this is how many loudness measurements are stored for each second of audio. 44,100 is a common rate.

wavFile.readFrames(frameData, dataSize)

The Apache Commons Math library provides code to execute an FFT. This bit of Scala code calls it with sampled data that has been read from a file.

val buckets : Array[Complex] = 
   fft.transform(frameData, TransformType.FORWARD);

The results are actually an array of vectors. To find the loudness of a pitch, you must compute the length of the vector. For a vector (a, b), this is computed by √(a2 + b2).

var index = 0
for (val c : Complex {
val magnitude : Double =
Math.sqrt(c.getImaginary() * c.getImaginary() +
c.getReal() * c.getReal())
index = index + 1

The position of each bucket combined with the sample rate, determines the range of frequencies this tone represents. To find which notes are included in output, I take the center of the range:

val frq = (.5 + index) * sampleRate / dataSize;

In this application, all octaves will be combined. This tests each notes in the lowest octave (A…G) and finds the closest match:

var numOctaves = Math.log(frq/noteFrq)/Math.log(2)
var normFrq = Math.abs(numOctaves - Math.floor(numOctaves));
if (normFrq < distance)
…found closer match...

I generated WAV files for all combinations of five notes in an octave – for the current version, 80% of generated test audio files are identified correctly. See this post on using Prolog to generate all chord combinations.

Working with audio is challenging, but rewarding. Files are hard to test without perfect pitch, and you must be remember to start with the speaker volume on low 🙂

The full source to this project is available on Github.