logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-05292022-233610


Tipo di tesi
Tesi di dottorato di ricerca
Autore
MARINI, MARCO
URN
etd-05292022-233610
Titolo
Automatic Speech Recognition system for people with impaired speech: development of an Italian Dysarthric Speech database and a new Speech Analysis Technique to improve speech recognition performance
Settore scientifico disciplinare
ING-INF/01
Corso di studi
INGEGNERIA DELL'INFORMAZIONE
Relatori
tutor Prof. Fanucci, Luca
Parole chiave
  • Kaldi
  • Artificial Intelligence
  • Dysarthria
  • Database
  • Automatic Speech Recognition
Data inizio appello
21/06/2022
Consultabilità
Non consultabile
Data di rilascio
21/06/2025
Riassunto
Nowadays, Artificial Intelligence (AI) is the most used technology to empower old and new tasks. Indeed, even though they involve more expertise, it is not only possible to embed AI in several scenarios, but it is highly recommended since it improves the final result thanks to its versatility and its modern perspective.

One of these cases is the Automatic Speech Recognition (ASR) task. From a general point of view, ASR is a process that analyses voice signals and generates its most likely transcription. It is a very challenging task because all subjects speak in a different way, even though they belong to the same culture and region.

In the last few years, ASR has become one of the most important research fields due to the growing demand for hands-free interface devices. This technology can help people to interact with smart devices in critical situations where hands are involved in another task. Driving a car is one of the cases where the user may be in need of interaction with a smart device (e.g. smartphone) without involving hands keeping it in a safe situation. However, what happens if a user has an impaired speech? Is the ASR technology able to deal with this issue?

Dysarthria is a speech disorder caused by impaired neurological function, motor control, and/or speech articulators. These speech impairments can result from acquired brain or spinal cord injuries (e.g., stroke) as well as congenital and neurodegenerative diseases, and age-related neurological decline. Within the field of ASR, the processing of dysarthric speech is a challenge because standard approaches are ineffective in the presence of dysarthria. As a result, users with such speech disorders are unable to get benefits from that kind of technology.

Since the ASR technology is based on statistical analysis and neural networks, the first step in order to improve the performance of speech recognition for dysarthric speakers is to create dysarthric databases. When we started our research path, there was no Italian dysarthric speech database available. Therefore, our first aim was to develop the first Italian Dysarthric Speech database (IDEA) thanks to a partnership with three Italian Medical Facilities. For this purpose, we have developed a specific PC tool named RECORDIA, which leads doctors and caregivers in patients’ characterization and speech recordings procedures. All the data collected by our partners thanks to RECORDIA software, have been stored on an online server located at University of Pisa and accessible all over the world.

After the Italian data collection process, we started to approach them with the well-known ASR technologies, in order to evaluate the goodness of the data, and we compared our results with the ones we obtained from other English databases. The standard ASR technologies are hybrid Hidden Markov Model combined with Gaussian Mixture Model (GMM-HMM) and Deep Neural Network (DNN-HMM). We also decided to use a new Features Extraction technique for dysarthric speakers, which tries to tune window and shift parameters of Short Time Fourier Transform basing it on the way a user speaks. Several experiments have been carried out by using audio contributions from 45 people with speech and communication disorders and 10 speakers without speech disorder. Then, a comparison was performed between performances of an ASR system, made through the Kaldi toolkit, that uses standard speech processing and our proposal. This approach has been found to be very effective for people with medium and high level of dysarthria, improving ASR performance.

An additional study was carried out in order to analyse the possible correlation between the new window and shift parameters and certain vocal characteristics of the subjects studied.
File