Thesis etd-09062023-155728 |
Link copiato negli appunti
Thesis type
Tesi di laurea magistrale
Author
INNOCENTI, DAVIDE
URN
etd-09062023-155728
Thesis title
Training Strategies for Effective Named Entity Processing in Neural Machine Translation for Low-Resouce Languages
Department
INFORMATICA
Course of study
DATA SCIENCE AND BUSINESS INFORMATICS
Supervisors
relatore Prof. Bondielli, Alessandro
Keywords
- low-resource languages
- machine learning
- machine translation
- NLP
- transfomers
Graduation session start date
06/10/2023
Availability
Withheld
Release date
06/10/2063
Summary
Natural Language Processing (NLP) and Machine Translation (MT) are groundbreaking fields at the intersection of academia and industry. They revolutionize the way humans communicate with machines and each other. These domains have witnessed substantial growth and impact in both academic and business realities.
This work responds to a specific need to create a MT system specifically designed for low-resource language and involves the creation of a process for identifying and managing terms that must not be translated (do not translate terms, DNTs) using Named Entity processing.
The study collects data from online sources to build Machine Translation models able to capture specifically designed tags for DNTs which will be eventually able to correctly handle selected entities during the inference steps produging adequate translations.
State-of-the-art tools have been exploited to achieve the goal, and new training strategies have been presented to develop these models.
The findings can serve as guidance for future applications in low-resource domains for both companies and academics.
This work responds to a specific need to create a MT system specifically designed for low-resource language and involves the creation of a process for identifying and managing terms that must not be translated (do not translate terms, DNTs) using Named Entity processing.
The study collects data from online sources to build Machine Translation models able to capture specifically designed tags for DNTs which will be eventually able to correctly handle selected entities during the inference steps produging adequate translations.
State-of-the-art tools have been exploited to achieve the goal, and new training strategies have been presented to develop these models.
The findings can serve as guidance for future applications in low-resource domains for both companies and academics.
File
| Nome file | Dimensione |
|---|---|
The thesis is not available. |
|