Introduction

Inspired from my signal analysis teacher at uni, who is also passionated with music (he is a keen oboe player!), I wanted to code a simple algorithm to analyze the notes in a music piece. The pitch of each note is defined by a frequency $f$. For example, the most common tuning fork gives a A at frequency $440$Hz. Fourier decomposition is a very efficient mathematical tool: it shows the different frequencies present in a signal. We suppose that we record an instrument, like a piano, where different notes are played consecutively.

We suppose that each note are played on a twelve-tone equal temperament keyboard. That means that the octave is divided into 12 parts which are equally spaced on a logarithmic scale.

Figure 1 : 12-tone equal temperament chromatic scale on C

From a first note A defined by the frequency $f_0 = 440$Hz, the other notes are defined by: $$ f_i = f_0 \sqrt[12]{2}^i $$

My method works in three steps:

  • smooth the temporal signal in order to detect the moment where the notes where played
  • apply Fourier transform to each note individually
  • find the fundamental frequency and convert it to a note.

Diving into the temporal signal

Here the music I recorded:

Audio: Piano impro

Let’s analyze the signal!

Figure 1 : Temporal signal

If we zoom into the time axis, we can observe something close to a sinusoidal signal:

Figure 2 : Zoom into the time signal


First step: Detect the moment where the notes where played

Figure 3 : Spectrogram of the song.

From the spectrogram, we can compute a smoothest version of the temporal signal which makes the detection of the note easy by a scipy function detect_peaks.

Figure 4 : Smooth version of the time signal, and the red vertical lines correspond to detected notes.

# Import functions
import numpy as np
import matplotlib.pyplot as plt
from scipy.fft import fft, ifft, fftfreq
from scipy.signal import spectrogram, find_peaks

# Compute the spectrogram and sum
f, t, Sxx = spectrogram(y, fs, nperseg=2**9)
enveloppe = Sxx.sum(axis=0)

# Detect the timestamps of the note
ind, descr = find_peaks(enveloppe, width=1, height=5e-5)

# Plot
plt.figure(figsize=(15,4))
plt.plot(t, enveloppe, label='Temporal signal') # temporal signal
for num_i,i in enumerate(ind): # the peaks
    plt.plot([t[i],t[i]], [0,descr['peak_heights'][num_i]], c='red')
plt.show()

Second step: Apply Fourier transform

Then we can apply for each interval apply a fast fourier transform to find which frequencies are present in this temporal signal. Then we can detect again the peaks and choose the highests as being the main harmonic.

Figure 3 : (Top) The extracted temporal signal corresponding to a note detected in the song. (Bottom) The corresponding Fourier transform of the signal. The highest peak is detected at $f=1575$. The other two peaks at higher frequencies correspond to the higher harmonics $2f$ and $3f$.

# Enumerate trough the peaks (the detected notes)
for num_i,i in enumerate(ind):
    t_min, t_max = t[descr['left_bases'][num_i]], t[descr['right_bases'][num_i]]
    if t_max-t_min<2: # check that the one is shorter than 2 secs

        # Subset 
        small_t, small_y = t_axis[(t_axis>=t_min)&(t_axis<=t_max)], \
            y[(t_axis>=t_min)&(t_axis<=t_max)]
        
        # FFT
        N = len(small_t)
        yf = fft(small_y)
        yf = 2.0/N * np.abs(yf[0:N//2])
        xf = fftfreq(N, 1/fs)[:N//2]

        # Detect the peaks in the spectrum
        peaks, _ = find_peaks(yf, distance=10, width=1)

        # Choose the highest peak
        index_maxi = np.argsort(yf[peaks])[-1]
        f = xf[peaks[index_maxi]]
        index_note = quantize_f_to_note(f)
        print(f, index_note, liste_notes_letter[index_note%12])
liste_notes_letter = ["A", "A#", "B", "C", "C#", "D", "D#", "E", "F", "F#", "G", "G#"]

Third step: Find the frequency then the note

We can inverse the formula presented earlier

$$ f_i = f_0 \sqrt[12]{2}^i $$

and find back which note $i$ is defined by frequency $f$:

$$ i = round(12 \log_2(\frac{f}{f_0})) $$

def quantize_f_to_note(f):
  # Input frequency f (Hz)
  # Ouput integer. 0 is A 440Hz, -1 is the Ab under, and 1 is A# above.
  return(np.round(12*np.log(f/440)/np.log(2)).astype(int))

Results

I still need to upload the result: compare the midi file to the output!