Tesi etd-07052022-183728 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
MADEDDU, MAURO
URN
etd-07052022-183728
Titolo
Assessing Island Effects in Italian Transformer-based Language Models
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Lenci, Alessandro
Parole chiave
- Bert
- deep learning
- experimental syntax
- extraction islands
- filler-gap dependencies
- Gpt
- island effects
- neural language models
- syntactic tests
- syntax
- targeted syntactic evaluation
- transformers
Data inizio appello
26/09/2022
Consultabilità
Tesi non consultabile
Riassunto
Modern language models based on deep artificial neural networks have achieved impressive progress in Natural Language Processing benchmarks and applications in the last few years. However, they have also shown to often fail to make robust human-like generalizations, and need massive amounts of data to reach state of the art performance (orders of magnitude more than that available to humans when they learn language). These advancements and limitations have made increasingly important to clarify which linguistic phenomena and generalizations they actually learn. A line of research has emerged on the fine-grained targeted linguistic evaluations of neural language models, in which the targeted syntactic evaluation approach in one of the main ones.
The assessment is done by administering to these models minimal pairs of sentences that vary minimally and isolate a particular linguistic phenomenon, and expect the model to give a higher score to the grammatical sentence over the ungrammatical one. A factorial experimental setup, common in psycholinguistic studies, can be considered as a generalization of the minimal pairs approach, and allows to test more complex linguistic phenomena while still controlling for confounds.
In this work, we focus on the assessment of island effects, which is one of the most challenging syntactic phenomena, to learn which NLMs have been shown to need more training data than most other syntactic phenomena. We extend and adapt an Italian test suite on wh-dependencies from a psycholinguistic study (Sprouse et al., 2016), and use it to evaluate four transformer-based models (GPT-2 and variants of BERT) pretrained in Italian, of fixed parameters size but varying in the amount of training data (from 2B to 13B tokens).
We find that subject islands are the phenomenon most correlated with training set size, while whether islands seem to be the easiest to learn. We find that the models’ responses resemble in part those of humans on average, as compared from the trends in plots of normalized acceptability judgments. We find that although the factorial experimental design is able to implicitly factor out some confounds when evenly distributed across conditions, the items should still be strictly controlled to have the same lexical content, as semantic and collocations effects seem to be predominant in affecting the models’ scores.
The assessment is done by administering to these models minimal pairs of sentences that vary minimally and isolate a particular linguistic phenomenon, and expect the model to give a higher score to the grammatical sentence over the ungrammatical one. A factorial experimental setup, common in psycholinguistic studies, can be considered as a generalization of the minimal pairs approach, and allows to test more complex linguistic phenomena while still controlling for confounds.
In this work, we focus on the assessment of island effects, which is one of the most challenging syntactic phenomena, to learn which NLMs have been shown to need more training data than most other syntactic phenomena. We extend and adapt an Italian test suite on wh-dependencies from a psycholinguistic study (Sprouse et al., 2016), and use it to evaluate four transformer-based models (GPT-2 and variants of BERT) pretrained in Italian, of fixed parameters size but varying in the amount of training data (from 2B to 13B tokens).
We find that subject islands are the phenomenon most correlated with training set size, while whether islands seem to be the easiest to learn. We find that the models’ responses resemble in part those of humans on average, as compared from the trends in plots of normalized acceptability judgments. We find that although the factorial experimental design is able to implicitly factor out some confounds when evenly distributed across conditions, the items should still be strictly controlled to have the same lexical content, as semantic and collocations effects seem to be predominant in affecting the models’ scores.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |