Thesis etd-06302021-103021 |
Link copiato negli appunti
Thesis type
Tesi di dottorato di ricerca
Author
CARTA, ANTONIO
URN
etd-06302021-103021
Thesis title
Memorization in Recurrent Neural Networks
Academic discipline
INF/01
Course of study
INFORMATICA
Supervisors
tutor Prof. Bacciu, Davide
Keywords
- continual learning
- recurrent neural networks
- sequence autoencoding
- short-term memory
Graduation session start date
20/07/2021
Availability
Full
Summary
Rich sources of data such as text, video, or time series, are composed of a sequence of elements. Traditionally, recurrent neural networks have been used to process sequences by keeping a trace of the past in a recursively updated hidden state. The ability of recurrent networks to memorize the past is fundamental to their success.
In this thesis, we study recurrent networks and their short-term memory, with the objective of maximizing it. In the literature, most models either do not optimize the short-term memory or they do so in a data-independent way. We propose a conceptual framework that splits recurrent networks into two separate components: a feature extractor and a memorization component. Following this separation, we show how to optimize the short-term memory of recurrent networks. This is a challenging problem, hard to solve by end-to-end backpropagation. We propose several solutions that allow us to efficiently optimize the memorization component. Finally, we apply our approach to two application domains: sentence embeddings for natural language processing and continual learning on sequential data.
Overall, we find that optimizing the short-term memory improves the ability of recurrent models to learn long-range dependencies, helps the training process, and provides features that generalize well to unseen data.
The findings of this thesis provide a better understanding of short-term memory in recurrent networks and suggest general principles that may be useful to design novel recurrent models with expressive memorization components.
In this thesis, we study recurrent networks and their short-term memory, with the objective of maximizing it. In the literature, most models either do not optimize the short-term memory or they do so in a data-independent way. We propose a conceptual framework that splits recurrent networks into two separate components: a feature extractor and a memorization component. Following this separation, we show how to optimize the short-term memory of recurrent networks. This is a challenging problem, hard to solve by end-to-end backpropagation. We propose several solutions that allow us to efficiently optimize the memorization component. Finally, we apply our approach to two application domains: sentence embeddings for natural language processing and continual learning on sequential data.
Overall, we find that optimizing the short-term memory improves the ability of recurrent models to learn long-range dependencies, helps the training process, and provides features that generalize well to unseen data.
The findings of this thesis provide a better understanding of short-term memory in recurrent networks and suggest general principles that may be useful to design novel recurrent models with expressive memorization components.
File
Nome file | Dimensione |
---|---|
main.pdf | 2.40 Mb |
relazione_finale.pdf | 137.69 Kb |
Contatta l’autore |