Tipo di tesi
Tesi di laurea magistrale
Titolo
Large-scale Cross-lingual Word Sense Disambiguation using Parallel Corpora
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Riassunto (Italiano)
This work explores a possible approach to develop word-sense annotated text resources relying on parallel corpora (made of the same text translated in multiple languages) and exploiting Open Multilingual WordNet in order to disambiguate words. This approach consists of overlapping available synsets in different languages for the same aligned word and using the remaining synset as a disambiguation tag or, at least, reducing the possible word-senses to a subset.
This approach is also evaluated as a way to provide suggestions for new lemma-synset links.
This experiment is done on Europarl corpus on 21 different languages and explores also how increasing or decreasing the number of involved languages can affect the effectiveness of this approach.