Tesi etd-09262024-165825 |
Link copiato negli appunti
Tipo di tesi
Tesi di dottorato di ricerca
Autore
RESTA, MICHELE
URN
etd-09262024-165825
Titolo
Continual Incremental Language Learning for Neural Machine Translation
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
tutor Prof. Bacciu, Davide
Parole chiave
- catastrophic forgetting
- continual learning
- incremental language learning
- lifelong learning
- neural machine translation
Data inizio appello
08/10/2024
Consultabilità
Completa
Riassunto
Since the inception of the Artificial Intelligence (AI) field, one of the main long-term objectives has been to understand and replicate intelligence in order to create systems capable of learning and behaving in a human-like manner. However, this task has proven to be extremely difficult for both traditional AI systems and neural approaches due to the phenomenon of catastrophic forgetting. When exposed to new data, neural network systems tend to erase previously learned knowledge.
In this context, Continual Learning (CL) has emerged as a research field aimed at mitigating this behavior and moving towards AI systems that mimic human learning capabilities in lifelong learning tasks and environments. With the deep learning shift in Machine Translation (MT) and Natural Language Processing (NLP) systems, these characteristics have become even more desirable, given the substantial resources involved in training these models, especially in terms of training efficiency and transferability of knowledge.
In this dissertation, we provide a practical contribution to this research area. We begin by reviewing fundamental concepts and theoretical aspects of Neural Machine Translation (NMT) and then survey prominent CL methodologies. Building on this foundation, we propose a Continual Learning framework for NMT with the goal of incrementally learning multilingual translation systems. We introduce the Continual Incremental Language Learning setting as a starting point to explore data selection strategies that enhance training efficiency when using effective continual learning strategies such as replay buffers.
Furthermore, we demonstrate that employing an NMT model both as a learner and as a generator of replay data is effective in mitigating performance loss during continued training, alleviating several requirements related to training data storage. Within this incremental language learning context, we empirically evaluate, through quantitative and qualitative analyses, both the classical training paradigm and the pre-training and fine-tuning paradigm. We discuss their unique aspects when employing classical data-based rehearsal strategies.
We extend our analysis to non-autoregressive NMT models and compare them to state-of-the-art autoregressive NMT systems. Through this work, we aim to provide a comprehensive framework and practical insights into continual learning for NMT, ultimately highlighting
the needs and benefits of this learning paradigm.
In this context, Continual Learning (CL) has emerged as a research field aimed at mitigating this behavior and moving towards AI systems that mimic human learning capabilities in lifelong learning tasks and environments. With the deep learning shift in Machine Translation (MT) and Natural Language Processing (NLP) systems, these characteristics have become even more desirable, given the substantial resources involved in training these models, especially in terms of training efficiency and transferability of knowledge.
In this dissertation, we provide a practical contribution to this research area. We begin by reviewing fundamental concepts and theoretical aspects of Neural Machine Translation (NMT) and then survey prominent CL methodologies. Building on this foundation, we propose a Continual Learning framework for NMT with the goal of incrementally learning multilingual translation systems. We introduce the Continual Incremental Language Learning setting as a starting point to explore data selection strategies that enhance training efficiency when using effective continual learning strategies such as replay buffers.
Furthermore, we demonstrate that employing an NMT model both as a learner and as a generator of replay data is effective in mitigating performance loss during continued training, alleviating several requirements related to training data storage. Within this incremental language learning context, we empirically evaluate, through quantitative and qualitative analyses, both the classical training paradigm and the pre-training and fine-tuning paradigm. We discuss their unique aspects when employing classical data-based rehearsal strategies.
We extend our analysis to non-autoregressive NMT models and compare them to state-of-the-art autoregressive NMT systems. Through this work, we aim to provide a comprehensive framework and practical insights into continual learning for NMT, ultimately highlighting
the needs and benefits of this learning paradigm.
File
Nome file | Dimensione |
---|---|
resta_th...nal_1.pdf | 4.48 Mb |
Contatta l’autore |