Thesis etd-10172013-181141 |
Link copiato negli appunti
Thesis type
Tesi di laurea magistrale
Author
DI PIETRO, GIULIA
URN
etd-10172013-181141
Thesis title
Regular polysemy: A distributional semantic approach
Department
FILOLOGIA, LETTERATURA E LINGUISTICA
Course of study
INFORMATICA UMANISTICA
Supervisors
relatore Prof. Lenci, Alessandro
correlatore Prof. Hsieh, Shu-Kai
correlatore Prof. Hsieh, Shu-Kai
Keywords
- distributional semantic space
- homonymy
- linguistics
- polysemy
- semantics
- sense alternations
- vector space models
- word sense
Graduation session start date
04/11/2013
Availability
Full
Summary
Polysemy and Homonymy are two different kinds of lexical ambiguity. The main difference between them is that plysemous words can share the same alternation - where alternation is the senses a word can have - and homonymous words have idiosyncratic alternations. This means that, for instance, a word such as lamb, whose alternation is given by the senses food and animal, is a polysemous word, given that a number of other words share this very alternation food-animal, e.g. the word fish. On the other hand, a word such as ball, whose possible senses are of artifact and event, is homonymous, given that no other words share the alternation artifact-event. Furthermore, polysemy highlights two different aspects of the same lexical item, where homonymy describes the fact that the same lexical unit is used to represent two different and completely unrelated word-meanings.
These two kinds of lexical ambiguity have even been an issue in lexicography, given that there is no clear rule used to distinguish between polysemous and homonymous words. As a matter of principle, we would expect to have different lexical entries for homonymous words, but only one lexical entry with internal differentiation for polysemous words. An important work needs to be mentioned here, that is the Generative Lexicon (Pustejovsky, 1995). This is a theoretical framework for lexical semantics which focuses on the compositionality of word meanings. In regard of polysemy and homonymy, GL provides a clear explanation of how it is possible to understand the appropriate sense of a word in a specific sentence. This is done by looking at the context in which the word appears, and, specifically, looking at the type of argument required by the predication.
These phenomena have even been of interest among computational linguists, insomuch as they have tried to implement some models able to predict the alter- nations polysemous words can have. One of the most important work concerning this matter is the one made by Boleda, Pado, Utt (2012), in which a model is proposed that is able to predict words having a particular alternation of senses. This means that, for instance, given an alternation such as food-animal, they can predict the words having that alternation. Another relevant work has been made by Rumshisky, Grinberg, Pustejovsky (2007), in which, using some syntac- tic information, they have managed to detect the senses a polysemous word can have. For instance, given the polysemous word lunch, whose sense alternation is food-event, they first extracted all of the verbs whose object can be the word lunch. This lead to the extraction of verbs requiring an argument expressing the sense of food (the verb cook can be extracted as verb whose object can be lunch), and verbs requiring the argument of event (again, lunch can be object of the verb to attend). Finally, they extracted all of the objects that those verbs can have (for instance, pasta can be object of the verb cook, and conference can be object of the verb to attend). By doing so, they can get to the creation of two clusters, each one of which represents words similar to one of the senses of the
ambiguous word.
These two models are totally different in the way they are implemented, even
though they are grounded in one of the most important theories used in compu- tational semantics: the Distributional Hypothesis. This theory can be stated as “words with similar meaning tend to occur in similar contexts”. To implement this theory, it is necessary to describe the contexts in a computational valid way, so that it will be possible to get a degree of similarity between two words by only looking at their contexts. The mathematical model used is the Vector, in which it is possible to store the frequency of a word in all its contexts. The model using vectors to describe the distributional properties of words is called Vector Space Model, which can be also called Distributional Model.
In this work, our goal is to automatically detect the alternation a word has. To do so, we have first considered the possibility of using a Sense Discrimina- tion procedure proposed by Schu ̈tze. In this method, he proposes to create a Distributional Model and use it to create context vectors and sense vectors. A context vector is given by the sum of the vectors of the words found in a context in which an ambiguous word appears, so there will be as many context vectors as there are occurrences of the target word. Once we have the context vectors, it is possible to get the sense vectors by simply clustering them together. The ideas is that two context vectors representing the same sense of the ambiguous word will be similar, and so clustered together. The centroid, that is the vector given by the sum of the context vectors clustered together, will be the sense vector. This means that there will be as many sense vectors as there are senses of an ambiguous word. Our idea was to use this work and go a step further in the creation of the alternation, but this was not possible for many reasons.
We have developed a new method to create context vectors, which is based on the idea that the understanding of an ambiguous word is given by some elements in the sentence in which the word appears.
Our model is able to carry out two tasks: 1) it can predict the alternation of a regular polysemous word; 2) it can distinguish whether the lexical ambiguity of a word is homonymy or regular polysemy.
These two kinds of lexical ambiguity have even been an issue in lexicography, given that there is no clear rule used to distinguish between polysemous and homonymous words. As a matter of principle, we would expect to have different lexical entries for homonymous words, but only one lexical entry with internal differentiation for polysemous words. An important work needs to be mentioned here, that is the Generative Lexicon (Pustejovsky, 1995). This is a theoretical framework for lexical semantics which focuses on the compositionality of word meanings. In regard of polysemy and homonymy, GL provides a clear explanation of how it is possible to understand the appropriate sense of a word in a specific sentence. This is done by looking at the context in which the word appears, and, specifically, looking at the type of argument required by the predication.
These phenomena have even been of interest among computational linguists, insomuch as they have tried to implement some models able to predict the alter- nations polysemous words can have. One of the most important work concerning this matter is the one made by Boleda, Pado, Utt (2012), in which a model is proposed that is able to predict words having a particular alternation of senses. This means that, for instance, given an alternation such as food-animal, they can predict the words having that alternation. Another relevant work has been made by Rumshisky, Grinberg, Pustejovsky (2007), in which, using some syntac- tic information, they have managed to detect the senses a polysemous word can have. For instance, given the polysemous word lunch, whose sense alternation is food-event, they first extracted all of the verbs whose object can be the word lunch. This lead to the extraction of verbs requiring an argument expressing the sense of food (the verb cook can be extracted as verb whose object can be lunch), and verbs requiring the argument of event (again, lunch can be object of the verb to attend). Finally, they extracted all of the objects that those verbs can have (for instance, pasta can be object of the verb cook, and conference can be object of the verb to attend). By doing so, they can get to the creation of two clusters, each one of which represents words similar to one of the senses of the
ambiguous word.
These two models are totally different in the way they are implemented, even
though they are grounded in one of the most important theories used in compu- tational semantics: the Distributional Hypothesis. This theory can be stated as “words with similar meaning tend to occur in similar contexts”. To implement this theory, it is necessary to describe the contexts in a computational valid way, so that it will be possible to get a degree of similarity between two words by only looking at their contexts. The mathematical model used is the Vector, in which it is possible to store the frequency of a word in all its contexts. The model using vectors to describe the distributional properties of words is called Vector Space Model, which can be also called Distributional Model.
In this work, our goal is to automatically detect the alternation a word has. To do so, we have first considered the possibility of using a Sense Discrimina- tion procedure proposed by Schu ̈tze. In this method, he proposes to create a Distributional Model and use it to create context vectors and sense vectors. A context vector is given by the sum of the vectors of the words found in a context in which an ambiguous word appears, so there will be as many context vectors as there are occurrences of the target word. Once we have the context vectors, it is possible to get the sense vectors by simply clustering them together. The ideas is that two context vectors representing the same sense of the ambiguous word will be similar, and so clustered together. The centroid, that is the vector given by the sum of the context vectors clustered together, will be the sense vector. This means that there will be as many sense vectors as there are senses of an ambiguous word. Our idea was to use this work and go a step further in the creation of the alternation, but this was not possible for many reasons.
We have developed a new method to create context vectors, which is based on the idea that the understanding of an ambiguous word is given by some elements in the sentence in which the word appears.
Our model is able to carry out two tasks: 1) it can predict the alternation of a regular polysemous word; 2) it can distinguish whether the lexical ambiguity of a word is homonymy or regular polysemy.
File
Nome file | Dimensione |
---|---|
main.pdf | 785.44 Kb |
Contatta l’autore |