Tesi etd-06182025-135015 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
BACCHESCHI, CORRADO
URN
etd-06182025-135015
Titolo
An investigation into Efficient Deep Recurrent Neural Networks for Natural Language Processing
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Lenci, Alessandro
relatore Prof. Micheli, Alessio
relatore Dott. Tortorella, Domenico
relatore Prof. Micheli, Alessio
relatore Dott. Tortorella, Domenico
Parole chiave
- deep learning
- echo state networks
- natural language processing
- recurrent neural networks
- reservoir computing
Data inizio appello
04/07/2025
Consultabilità
Completa
Riassunto
Reservoir Computing (RC) enables efficiently-trained deep Recurrent Neural Networks (RNNs) by removing the need to train the hierarchy of representations of the input sequences. In this work, we analyze the performance and the dynamical behavior of RC models, specifically Deep Bidirectional Echo State Networks (Deep-BiESNs), applied to Natural Language Processing (NLP) tasks. As a first step, we investigate a set of linguistic probing tasks to gain a general understanding of how Deep-BiESNs encode linguistic properties. These preliminary results show that the generated representations effectively capture a wide range of linguistic features. Therefore, we extend our analysis to six standard NLP downstream tasks: three sequence-to-vector tasks for sequence-level classification and three sequence-to-sequence tasks for token-level labeling. We compare the performance of Deep-BiESNs against fully-trained NLP reference models, showing that Deep-BiESNs achieve comparable or superior performance, while also requiring less training time than fully-trained RNNs. In addition, we analyze the dynamical properties of these RC models, highlighting how the hierarchy of representations in Deep-BiESNs layers contributes to forming the class prediction in the probing and in the downstream tasks. This analysis is particularly relevant in the NLP domain because language inherently involves dependencies that occur over various temporal horizons. Finally, for the downstream tasks, we also investigate via Class Activation Mapping (CAM) how the readout layer assigns importance to individual words, observing that it effectively emphasizes the most relevant tokens for accurate prediction. These findings not only highlight the potential of Deep ESNs as a competitive and efficient alternative for NLP applications but also contribute to a deeper understanding of how to effectively model such architectures to address a variety of linguistic tasks.
File
Nome file | Dimensione |
---|---|
TesiBaccheschi.pdf | 1.78 Mb |
Contatta l’autore |