Tesi etd-05292022-233610

Tipo di tesi

Tesi di dottorato di ricerca

URN

etd-05292022-233610

Titolo

Automatic Speech Recognition system for people with impaired speech: development of an Italian Dysarthric Speech database and a new Speech Analysis Technique to improve speech recognition performance

Settore scientifico disciplinare

ING-INF/01 - ELETTRONICA

Corso di studi

INGEGNERIA DELL'INFORMAZIONE

Relatori

tutor Prof. Fanucci, Luca

Parole chiave

Artificial Intelligence
Automatic Speech Recognition
Database
Dysarthria
Kaldi

Data inizio appello

21/06/2022

Consultabilità

Non consultabile

Data di rilascio

21/06/2025

Riassunto (Inglese)

Riassunto (Italiano)

Nowadays, Artificial Intelligence (AI) is the most used technology to empower old and new tasks. Indeed, even though they involve more expertise, it is not only possible to embed AI in several scenarios, but it is highly recommended since it improves the final result thanks to its versatility and its modern perspective.

One of these cases is the Automatic Speech Recognition (ASR) task. From a general point of view, ASR is a process that analyses voice signals and generates its most likely transcription. It is a very challenging task because all subjects speak in a different way, even though they belong to the same culture and region.

In the last few years, ASR has become one of the most important research fields due to the growing demand for hands-free interface devices. This technology can help people to interact with smart devices in critical situations where hands are involved in another task. Driving a car is one of the cases where the user may be in need of interaction with a smart device (e.g. smartphone) without involving hands keeping it in a safe situation. However, what happens if a user has an impaired speech? Is the ASR technology able to deal with this issue?

Dysarthria is a speech disorder caused by impaired neurological function, motor control, and/or speech articulators. These speech impairments can result from acquired brain or spinal cord injuries (e.g., stroke) as well as congenital and neurodegenerative diseases, and age-related neurological decline. Within the field of ASR, the processing of dysarthric speech is a challenge because standard approaches are ineffective in the presence of dysarthria. As a result, users with such speech disorders are unable to get benefits from that kind of technology.

Since the ASR technology is based on statistical analysis and neural networks, the first step in order to improve the performance of speech recognition for dysarthric speakers is to create dysarthric databases. When we started our research path, there was no Italian dysarthric speech database available. Therefore, our first aim was to develop the first Italian Dysarthric Speech database (IDEA) thanks to a partnership with three Italian Medical Facilities. For this purpose, we have developed a specific PC tool named RECORDIA, which leads doctors and caregivers in patients’ characterization and speech recordings procedures. All the data collected by our partners thanks to RECORDIA software, have been stored on an online server located at University of Pisa and accessible all over the world.

After the Italian data collection process, we started to approach them with the well-known ASR technologies, in order to evaluate the goodness of the data, and we compared our results with the ones we obtained from other English databases. The standard ASR technologies are hybrid Hidden Markov Model combined with Gaussian Mixture Model (GMM-HMM) and Deep Neural Network (DNN-HMM). We also decided to use a new Features Extraction technique for dysarthric speakers, which tries to tune window and shift parameters of Short Time Fourier Transform basing it on the way a user speaks. Several experiments have been carried out by using audio contributions from 45 people with speech and communication disorders and 10 speakers without speech disorder. Then, a comparison was performed between performances of an ASR system, made through the Kaldi toolkit, that uses standard speech processing and our proposal. This approach has been found to be very effective for people with medium and high level of dysarthria, improving ASR performance.

An additional study was carried out in order to analyse the possible correlation between the new window and shift parameters and certain vocal characteristics of the subjects studied.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-05292022-233610