Digital archive of theses discussed at the University of Pisa


Thesis etd-03152023-205937

Thesis type
Tesi di laurea magistrale
Thesis title
Automatic Music Transcription using convolutional neural network(CNN) and constant-Q transform(CQT)
Course of study
relatore Prof. Cucinotta, Tommaso
relatore Prof. Bacciu, Davide
  • Multi-pitch estimation
  • Deep Learning
  • Constant-Q transform
  • Automatic Music Transcription
Graduation session start date
Automatic music transcription (AMT) is the problem of analyzing an audio recording of a musical piece and detecting notes that are being played. AMT is a challenging problem, particularly when it comes to polyphonic music.

In a traditional AMT, a holistic approach is commonly used, which involves using
a single model to identify and detect all the musical notes in a given piece of music.
These models are often large and complex making it difficult to train and deploy in
real-world scenarios. This thesis introduces a new approach for transcribing a piano
musical piece using a Convolutional neural network (CNN) and Constant-Q transform (CQT). The approach involves training separate classifiers per octave, with
each classifier capable of detecting notes in an octave. The features from the audio signals are extracted using the CQT method and the resulting CQT coefficients
are used as an input to the CNN model. By dividing the task of music transcription into multiple small tasks, this approach is expected to significantly reduce the
model complexity and computation overhead of the transcription task, making it
more suitable for real-world applications such as real-time music transcription.

Two approaches of subdividing the task of music transcription have been tested.
The first approach involves detecting notes/chords for a given octave considering
the given octave itself and the two octaves above. The second approach, in addition to the two octaves above, it considers an octave below the current one. This approach was compared with holistic models to determine its effectiveness. The experiments are aimed at evaluating the effectiveness of the proposed approach and its potential benefits in comparison to existing methods for transcribing piano songs.

Results show that the proposed approach was able to achieve a frame-based accuracy that was comparable to the holistic approach with significantly fewer parameters and training time demonstrating the feasibility of the proposed approach.