logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03092023-181146


Tipo di tesi
Tesi di laurea magistrale
Autore
TESTA, DAVIDE
URN
etd-03092023-181146
Titolo
Do Neural Language Models understand Elliptical Sentences? A New Framework for Evaluating Ellipsis and its Interaction with Thematic Fit
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Lenci, Alessandro
Parole chiave
  • ellipsis
  • elliptical sentences
  • ellipsis dataset
  • thematic fit
  • event knowledge
  • neural language models
  • transformers language models
Data inizio appello
13/04/2023
Consultabilità
Non consultabile
Data di rilascio
13/04/2093
Riassunto
Ellipsis is a cover term for a number of linguistic phenomena where a sentence lacks material that would normally be obligatory, and the missing material is nevertheless semantically recoverable from the local syntactic or semantic context. It is of central interest to theorists of language exactly because it represents a situation where the usual form-meaning mappings seem to be entirely absent. Sentences which involve ellipsis are linguistically interesting because a clear interpretation from human speakers is conveyed despite the absence of overtly expressed material. This leads to questions about how ellipsis and elliptical sentences are mentally represented and how the interpretation of the elided material can be easily recovered.
Additionally, solving such a linguistic construction is not a trivial issue in natural language processing (NLP) since it involves the retrieval of non-overtly expressed verbal material, which might in turn require the model to integrate human-like syntactic and semantic knowledge. Ellipsis is a relatively understudied problem in Natural Language Processing literature, given the difficulty of its resolution and the scarcity of benchmarks for the task. However, the phenomenon is widely recognized as an important source of errors in tasks such as dialogue understanding and machine translation.
Thus, this work - which is focused on the study of verbal ellipsis- tries to analyze ellipsis from a new perspective, through the development of a new framework that exploits the notion of thematic fit and event knowledge in the resolution of such constructions since it was proved that for language comprehension, humans rely on a Generalized Event Knowledge (GEK). Such pragmatical knowledge works as a network of reciprocal activations between events and participants, and the concept behind thematic fit reflects somehow the ’strength of activation’ between the elements in this network since it represents the degree of typicality of the participants of an event. Specifically, such a research work aims to explore the issue of how the prototypicality of event participants affects the ability of Transformers Language Models (TLMs) to handle elliptical sentences, and to identify the omitted arguments at different degrees of thematic fit, ranging from highly typical participants to semantically anomalous ones.
With this purpose in mind, ‘ELLie’ was created. It is the first dataset composed entirely of utterances containing different types of elliptical constructions, and structurally suited for evaluating the effect of argument thematic fit in solving ellipsis and reconstructing the missing element. Such a goal was achieved by creating specific sentences in which thematic roles differ for argument typicality.
The first tests were the classical Thematic fit estimation task and consisted in estimating a typicality score of a candidate argument with respect to a given verb semantic role or the score of the whole sentence. They demonstrated that the probability scores assigned by TLMs are higher for typical events than for atypical and impossible ones in different elliptical contexts, confirming the influence of prototypicality of the event participants in interpreting such linguistic structures. Unfortunately, the last test, which consisted of a retrieval task of the elided verb in the elliptical clause, produced low performances. Such bad scores highlighted considerable difficulty by TLMs in reconstructing the correct event and proved their tendency to rely on frequent lexical co-occurrences, without being able to reconstruct the implicit syntactic and semantic structure necessary to interpret elliptical sentences.
File