In a previous example, I showed how to find the onset of a single drumbeat, as well as the chord at an instant.
This new example extends the method to detect the onset of several notes in a row, and demonstrates some interesting challenges involved in musical transcription. The general process is to read in audio, smooth the input, then find the most significant sudden increases in loudness (other techniques measure energy or pitch).
Each of the above steps has many choices- for instance there are different ways to smooth audio. I used a rolling mean, but one paper I read recommended a low pass filter.
The greatest challenge is to control false positives and false negatives to an acceptable level. One heuristic improvement to the technique below might filter out sudden sounds when the music volume is low, preventing a cough registering as a beat in a quiet section of an orchestral piece.
library(sound) library(zoo) fourbeats<-loadSample("out\\fourbeats.wav") fiftyms<-sample$rate/200 tenms<-fiftyms / 5 plot(1:length(fourbeats$sound), abs(fourbeats$sound), type="l")
x<-fourbeats$sound[1,] ax<-abs(x) sx<-rollmean(ax, 100) plot(1:length(x), ax, type="l")
plot(1:length(sx), sx, type="l")
dx<-firstOrderDiff(x, 50) plot(1:length(dx), dx, type="l")
Now, we look for local maxima, in overlapping 50ms sections. These overlap to allow the detection of instances where the beat occurs right on the dividing line.
candidates<-rollapply(dx, fiftyms, which.max, align="left", by=fiftyms/2) ix<-(0:(length(candidates)-1))*fiftyms/2 + candidates for (i in 1:length(ix)) { if (dx[ix[i]] >= 0.01) { abline(v=(i-1)*fiftyms*2, col="red") } }
As you can see above, this technique detects several local maxima in the area where the beat occurs, even in spite of looking within 25ms intervals. An improvement to this technique would limit the number of candidates in a given time period based on the knowledge that the frequency of musical beats is typically limited to what a musician can play (although this is not true of electronic music).
If we cheat a little, we can also set a minimum level for what is considered to be a beat - in a more realistic scenario this would likely be a ratio to the surrounding music. This filtering parameter would also provide a good hook to link this technique to a AI training algorithm.
See also: R Cookbook Review