Tesi etd-06182025-135015

Tipo di tesi

Tesi di laurea magistrale

Autore

BACCHESCHI, CORRADO

URN

etd-06182025-135015

Titolo

An investigation into Efficient Deep Recurrent Neural Networks for Natural Language Processing

Dipartimento

FILOLOGIA, LETTERATURA E LINGUISTICA

Corso di studi

INFORMATICA UMANISTICA

Relatori

relatore Prof. Lenci, Alessandro
relatore Prof. Micheli, Alessio
relatore Dott. Tortorella, Domenico

Parole chiave

deep learning
echo state networks
natural language processing
recurrent neural networks
reservoir computing

Data inizio appello

04/07/2025

Consultabilità

Completa

Riassunto

Reservoir Computing (RC) enables efficiently-trained deep Recurrent Neural Networks (RNNs) by removing the need to train the hierarchy of representations of the input sequences. In this work, we analyze the performance and the dynamical behavior of RC models, specifically Deep Bidirectional Echo State Networks (Deep-BiESNs), applied to Natural Language Processing (NLP) tasks. As a first step, we investigate a set of linguistic probing tasks to gain a general understanding of how Deep-BiESNs encode linguistic properties. These preliminary results show that the generated representations effectively capture a wide range of linguistic features. Therefore, we extend our analysis to six standard NLP downstream tasks: three sequence-to-vector tasks for sequence-level classification and three sequence-to-sequence tasks for token-level labeling. We compare the performance of Deep-BiESNs against fully-trained NLP reference models, showing that Deep-BiESNs achieve comparable or superior performance, while also requiring less training time than fully-trained RNNs. In addition, we analyze the dynamical properties of these RC models, highlighting how the hierarchy of representations in Deep-BiESNs layers contributes to forming the class prediction in the probing and in the downstream tasks. This analysis is particularly relevant in the NLP domain because language inherently involves dependencies that occur over various temporal horizons. Finally, for the downstream tasks, we also investigate via Class Activation Mapping (CAM) how the readout layer assigns importance to individual words, observing that it effectively emphasizes the most relevant tokens for accurate prediction. These findings not only highlight the potential of Deep ESNs as a competitive and efficient alternative for NLP applications but also contribute to a deeper understanding of how to effectively model such architectures to address a variety of linguistic tasks.

File

Nome file	Dimensione
TesiBaccheschi.pdf	1.78 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06182025-135015