Tipo di tesi
Tesi di laurea magistrale
Titolo
Training Strategies for Effective Named Entity Processing in Neural Machine Translation for Low-Resouce Languages
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Riassunto (Italiano)
Natural Language Processing (NLP) and Machine Translation (MT) are groundbreaking fields at the intersection of academia and industry. They revolutionize the way humans communicate with machines and each other. These domains have witnessed substantial growth and impact in both academic and business realities.
This work responds to a specific need to create a MT system specifically designed for low-resource language and involves the creation of a process for identifying and managing terms that must not be translated (do not translate terms, DNTs) using Named Entity processing.
The study collects data from online sources to build Machine Translation models able to capture specifically designed tags for DNTs which will be eventually able to correctly handle selected entities during the inference steps produging adequate translations.
State-of-the-art tools have been exploited to achieve the goal, and new training strategies have been presented to develop these models.
The findings can serve as guidance for future applications in low-resource domains for both companies and academics.