logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09112021-110903


Tipo di tesi
Tesi di laurea magistrale
Autore
PETROLITO, TOMMASO
URN
etd-09112021-110903
Titolo
Large-scale Cross-lingual Word Sense Disambiguation using Parallel Corpora
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Lenci, Alessandro
Parole chiave
  • disambiguazione semantica
  • disambiguazione
  • semantica computazionale
  • computational semantics
  • word net
  • WordNet
  • word sense disambiguation
Data inizio appello
27/09/2021
Consultabilità
Tesi non consultabile
Riassunto
This work explores a possible approach to develop word-sense annotated text resources relying on parallel corpora (made of the same text translated in multiple languages) and exploiting Open Multilingual WordNet in order to disambiguate words. This approach consists of overlapping available synsets in different languages for the same aligned word and using the remaining synset as a disambiguation tag or, at least, reducing the possible word-senses to a subset.
This approach is also evaluated as a way to provide suggestions for new lemma-synset links.
This experiment is done on Europarl corpus on 21 different languages and explores also how increasing or decreasing the number of involved languages can affect the effectiveness of this approach.
File