logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09062023-155728


Tipo di tesi
Tesi di laurea magistrale
Autore
INNOCENTI, DAVIDE
URN
etd-09062023-155728
Titolo
Training Strategies for Effective Named Entity Processing in Neural Machine Translation for Low-Resouce Languages
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Prof. Bondielli, Alessandro
Parole chiave
  • low-resource languages
  • machine learning
  • machine translation
  • NLP
  • transfomers
Data inizio appello
06/10/2023
Consultabilità
Non consultabile
Data di rilascio
06/10/2063
Riassunto
Natural Language Processing (NLP) and Machine Translation (MT) are groundbreaking fields at the intersection of academia and industry. They revolutionize the way humans communicate with machines and each other. These domains have witnessed substantial growth and impact in both academic and business realities.
This work responds to a specific need to create a MT system specifically designed for low-resource language and involves the creation of a process for identifying and managing terms that must not be translated (do not translate terms, DNTs) using Named Entity processing.
The study collects data from online sources to build Machine Translation models able to capture specifically designed tags for DNTs which will be eventually able to correctly handle selected entities during the inference steps produging adequate translations.
State-of-the-art tools have been exploited to achieve the goal, and new training strategies have been presented to develop these models.
The findings can serve as guidance for future applications in low-resource domains for both companies and academics.
File