Tesi etd-03152023-205937

Tipo di tesi

Tesi di laurea magistrale

Autore

TELILA, YOHANNIS KIFLE

URN

etd-03152023-205937

Titolo

Automatic Music Transcription using convolutional neural network(CNN) and constant-Q transform(CQT)

Dipartimento

INFORMATICA

Corso di studi

INFORMATICA

Relatori

relatore Prof. Cucinotta, Tommaso
relatore Prof. Bacciu, Davide

Parole chiave

Automatic Music Transcription
Constant-Q transform
Deep Learning
Multi-pitch estimation

Data inizio appello

14/04/2023

Consultabilità

Tesi non consultabile

Riassunto

Automatic music transcription (AMT) is the problem of analyzing an audio recording of a musical piece and detecting notes that are being played. AMT is a challenging problem, particularly when it comes to polyphonic music.

In a traditional AMT, a holistic approach is commonly used, which involves using
a single model to identify and detect all the musical notes in a given piece of music.
These models are often large and complex making it difficult to train and deploy in
real-world scenarios. This thesis introduces a new approach for transcribing a piano
musical piece using a Convolutional neural network (CNN) and Constant-Q transform (CQT). The approach involves training separate classifiers per octave, with
each classifier capable of detecting notes in an octave. The features from the audio signals are extracted using the CQT method and the resulting CQT coefficients
are used as an input to the CNN model. By dividing the task of music transcription into multiple small tasks, this approach is expected to significantly reduce the
model complexity and computation overhead of the transcription task, making it
more suitable for real-world applications such as real-time music transcription.

Two approaches of subdividing the task of music transcription have been tested.
The first approach involves detecting notes/chords for a given octave considering
the given octave itself and the two octaves above. The second approach, in addition to the two octaves above, it considers an octave below the current one. This approach was compared with holistic models to determine its effectiveness. The experiments are aimed at evaluating the effectiveness of the proposed approach and its potential benefits in comparison to existing methods for transcribing piano songs.

Results show that the proposed approach was able to achieve a frame-based accuracy that was comparable to the holistic approach with significantly fewer parameters and training time demonstrating the feasibility of the proposed approach.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03152023-205937