logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09132024-151709


Tipo di tesi
Tesi di dottorato di ricerca
Autore
PEDROTTI, ANDREA
URN
etd-09132024-151709
Titolo
Heterogeneous Transfer Learning in Natural Language Processing
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
tutor Dott. Moreo Fernández, Alejandro
tutor Dott. Sebastiani, Fabrizio
Parole chiave
  • cross-lingual
  • multi-modal
  • transfer learning
Data inizio appello
01/10/2024
Consultabilità
Non consultabile
Data di rilascio
01/10/2027
Riassunto
With the advances in Deep Learning, the term Transfer Learning (TL) has become ubiquitous in the field of Machine Learning. One of the most widely adopted strategies when working with pre-trained models is to fine-tune them on downstream tasks by leveraging a relatively smaller labeled dataset compared to the amount of training data used for the pre-training phase. Fine-tuning is in fact a common technique of transfer learning.
In general TL, refers to a set of techniques and approaches which leverage training data sampled from a source distribution to improve performance on a test set, the target, containing elements sampled from a different, but related, distribution. This paradigm brings about two major advantages. First, it increases performance on the target domain by making the algorithm more robust and resilient, allowing us to leverage powerful pre-trained models that are trained on hardware not widely available. Second, it allows the application of data-intensive techniques to many scarce-resource domains where training an ad-hoc solution would be impossible.
In this thesis, we explore applications of Heterogeneous Transfer Learning (HTL) to the field of Natural Language Processing. We identify two main exploratory spaces: (i) the heterogeneous space defined by different languages and (ii) the heterogeneous space defined by the intersection of languages and perceptual information. Lastly, we explore the benefits of HTL when dealing simultaneously with both multimodality and multilinguality.
File