ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-01172019-134057


Tipo di tesi
Tesi di laurea magistrale
Autore
BONANSINGA, GIULIA
URN
etd-01172019-134057
Titolo
Cross-lingual word sense annotation with multilingual wordnets
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Lenci, Alessandro
relatore Prof. Bond, Francis Charles
controrelatore Dott. Dell'Orletta, Felice
Parole chiave
  • cross-lingual word sense disambiguation
  • word sense disambiguation
  • sense annotation
  • wordnet
  • multilingual sense intersection
  • sense projection
Data inizio appello
04/02/2019
Consultabilità
Non consultabile
Data di rilascio
04/02/2089
Riassunto
Cross-lingual approaches are exploited to enrich existing parallel corpora with semantic annotation in an inexpensive fashion. Human-checked annotations, though extremely beneficial to make substantial progress in Word Sense Disambiguation (WSD), are very time-consuming to produce and alternative options ought so be sought.

We first compare two such approaches that can be applied to any multilingual parallel corpus, as long as large inter-linked sense inventories exist for all the languages involved and word alignments are provided. If not complete, at least partial disambiguation can be achieved by exploiting both the similarities and differences among the languages involved.
Secondly, we attempt to disambiguate a multilingual parallel corpus, derived from SemCor and its sibling projects (Landes et al. 1998; Bentivogli and Pianta 2005; Lupu et al. 2005; Bond et al. 2012), by means of Multilingual Sense Intersection (MSI).

Unlike sense projection, MSI can be applied to most existing multilingual parallel corpora, because it does not require the availability of sense annotation for any text in the corpus. MSI, though more error-prone, can boost coverage of the annotation for multilingual parallel corpora, as long as there are sense inventories of adequate size linked to each other for the target languages.

The availability of sense-annotated corpora is crucial for training Supervised WSD systems and advance machines in all automatic processing of a text. We release the tools to perform MSI and the result of its application on a subset of the SemCor projects.
File