Tesi etd-09262024-165825

Tipo di tesi

Tesi di dottorato di ricerca

Autore

RESTA, MICHELE

URN

etd-09262024-165825

Titolo

Continual Incremental Language Learning for Neural Machine Translation

Settore scientifico disciplinare

INF/01

Corso di studi

INFORMATICA

Relatori

tutor Prof. Bacciu, Davide

Parole chiave

catastrophic forgetting
continual learning
incremental language learning
lifelong learning
neural machine translation

Data inizio appello

08/10/2024

Consultabilità

Completa

Riassunto

Since the inception of the Artificial Intelligence (AI) field, one of the main long-term objectives has been to understand and replicate intelligence in order to create systems capable of learning and behaving in a human-like manner. However, this task has proven to be extremely difficult for both traditional AI systems and neural approaches due to the phenomenon of catastrophic forgetting. When exposed to new data, neural network systems tend to erase previously learned knowledge.
In this context, Continual Learning (CL) has emerged as a research field aimed at mitigating this behavior and moving towards AI systems that mimic human learning capabilities in lifelong learning tasks and environments. With the deep learning shift in Machine Translation (MT) and Natural Language Processing (NLP) systems, these characteristics have become even more desirable, given the substantial resources involved in training these models, especially in terms of training efficiency and transferability of knowledge.
In this dissertation, we provide a practical contribution to this research area. We begin by reviewing fundamental concepts and theoretical aspects of Neural Machine Translation (NMT) and then survey prominent CL methodologies. Building on this foundation, we propose a Continual Learning framework for NMT with the goal of incrementally learning multilingual translation systems. We introduce the Continual Incremental Language Learning setting as a starting point to explore data selection strategies that enhance training efficiency when using effective continual learning strategies such as replay buffers.
Furthermore, we demonstrate that employing an NMT model both as a learner and as a generator of replay data is effective in mitigating performance loss during continued training, alleviating several requirements related to training data storage. Within this incremental language learning context, we empirically evaluate, through quantitative and qualitative analyses, both the classical training paradigm and the pre-training and fine-tuning paradigm. We discuss their unique aspects when employing classical data-based rehearsal strategies.
We extend our analysis to non-autoregressive NMT models and compare them to state-of-the-art autoregressive NMT systems. Through this work, we aim to provide a comprehensive framework and practical insights into continual learning for NMT, ultimately highlighting
the needs and benefits of this learning paradigm.

File

Nome file	Dimensione
resta_th...nal_1.pdf	4.48 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09262024-165825