ETD system

Electronic theses and dissertations repository


Tesi etd-04042008-163914

Thesis type
Tesi di dottorato di ricerca
Automatic Generation of Lexical Resources for Opinion Mining: Models, Algorithms and Applications
Settore scientifico disciplinare
Corso di studi
Relatore Prof. Simoncini, Luca
Relatore Dott. Sebastiani, Fabrizio
Parole chiave
  • sentiment classification
  • random walk models
  • opinion mining
  • lexical resources
  • information extraction
  • gloss classification
  • text classification
Data inizio appello
Riassunto analitico
Opinion mining is a recent discipline at the crossroads of Information Retrieval and of Computational Linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. It has a rich set of applications, ranging from tracking users&#39; opinions about products or about political candidates as expressed in online forums, to customer relationship management.<br>Functional to the extraction of opinions from text is the determination of the relevant entities of the language that are used to express opinions, and their opinion-related properties. For example, determining that the term beautiful casts a positive connotation to its subject.<br><br>In this thesis we investigate on the automatic recognition of opinion-related properties of terms. This results into building opinion-related lexical resources, which can be used into opinion mining applications.<br>We start from the (relatively) simple problem of determining the orientation of subjective terms.<br>We propose an original semi-supervised term classification model that is based on the quantitative analysis of the glosses of such terms, i.e. the definitions that these terms are given in on-line dictionaries. This method outperforms all known methods when tested on the recognized standard benchmarks for this task.<br><br>We show how our method is capable to produce good results on more complex tasks, such as discriminating subjective terms (e.g., good) from objective ones (e.g., green), or classifying terms on a fine-grained attitude taxonomy.<br><br>We then propose a relevant refinement of the task, i.e., distinguishing the opinion-related properties of distinct term senses. We present SentiWordNet, a novel high-quality, high-coverage lexical resource, where each one of the 115,424 senses contained in WordNet has been automatically evaluated on the three dimensions of positivity, negativity, and objectivity.<br><br>We propose also an original and effective use of random-walk models to rank term senses by their positivity or negativity. The random-walk algorithms we present have a great application potential also outside the opinion mining area, for example in word sense disambiguation tasks. A result of this experience is the generation of an improved version of SentiWordNet.<br><br>We finally evaluate and compare the various versions of SentiWordNet we present here with other opinion-related lexical resources well-known in literature, experimenting their use in an Opinion Extraction application. We show that the use of SentiWordNet produces a significant improvement with respect to the baseline system, not using any specialized lexical resource, and also with respect to the use of other opinion-related lexical resources.<br><br>