Tesi etd-03152023-205937 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
TELILA, YOHANNIS KIFLE
URN
etd-03152023-205937
Titolo
Automatic Music Transcription
using convolutional neural network(CNN)
and constant-Q transform(CQT)
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Cucinotta, Tommaso
relatore Prof. Bacciu, Davide
relatore Prof. Bacciu, Davide
Parole chiave
- Automatic Music Transcription
- Constant-Q transform
- Deep Learning
- Multi-pitch estimation
Data inizio appello
14/04/2023
Consultabilità
Tesi non consultabile
Riassunto
Automatic music transcription (AMT) is the problem of analyzing an audio recording of a musical piece and detecting notes that are being played. AMT is a challenging problem, particularly when it comes to polyphonic music.
In a traditional AMT, a holistic approach is commonly used, which involves using
a single model to identify and detect all the musical notes in a given piece of music.
These models are often large and complex making it difficult to train and deploy in
real-world scenarios. This thesis introduces a new approach for transcribing a piano
musical piece using a Convolutional neural network (CNN) and Constant-Q transform (CQT). The approach involves training separate classifiers per octave, with
each classifier capable of detecting notes in an octave. The features from the audio signals are extracted using the CQT method and the resulting CQT coefficients
are used as an input to the CNN model. By dividing the task of music transcription into multiple small tasks, this approach is expected to significantly reduce the
model complexity and computation overhead of the transcription task, making it
more suitable for real-world applications such as real-time music transcription.
Two approaches of subdividing the task of music transcription have been tested.
The first approach involves detecting notes/chords for a given octave considering
the given octave itself and the two octaves above. The second approach, in addition to the two octaves above, it considers an octave below the current one. This approach was compared with holistic models to determine its effectiveness. The experiments are aimed at evaluating the effectiveness of the proposed approach and its potential benefits in comparison to existing methods for transcribing piano songs.
Results show that the proposed approach was able to achieve a frame-based accuracy that was comparable to the holistic approach with significantly fewer parameters and training time demonstrating the feasibility of the proposed approach.
In a traditional AMT, a holistic approach is commonly used, which involves using
a single model to identify and detect all the musical notes in a given piece of music.
These models are often large and complex making it difficult to train and deploy in
real-world scenarios. This thesis introduces a new approach for transcribing a piano
musical piece using a Convolutional neural network (CNN) and Constant-Q transform (CQT). The approach involves training separate classifiers per octave, with
each classifier capable of detecting notes in an octave. The features from the audio signals are extracted using the CQT method and the resulting CQT coefficients
are used as an input to the CNN model. By dividing the task of music transcription into multiple small tasks, this approach is expected to significantly reduce the
model complexity and computation overhead of the transcription task, making it
more suitable for real-world applications such as real-time music transcription.
Two approaches of subdividing the task of music transcription have been tested.
The first approach involves detecting notes/chords for a given octave considering
the given octave itself and the two octaves above. The second approach, in addition to the two octaves above, it considers an octave below the current one. This approach was compared with holistic models to determine its effectiveness. The experiments are aimed at evaluating the effectiveness of the proposed approach and its potential benefits in comparison to existing methods for transcribing piano songs.
Results show that the proposed approach was able to achieve a frame-based accuracy that was comparable to the holistic approach with significantly fewer parameters and training time demonstrating the feasibility of the proposed approach.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |