logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01122017-120621


Tipo di tesi
Tesi di dottorato di ricerca
Autore
PASSARO, LUCIA
URN
etd-01122017-120621
Titolo
Distributional Models of Emotions for Sentiment Analysis in Social Media
Settore scientifico disciplinare
ING-INF/05
Corso di studi
INGEGNERIA DELL'INFORMAZIONE
Relatori
tutor Prof. Lenci, Alessandro
tutor Prof. Marcelloni, Francesco
tutor Prof.ssa Vaglini, Gigliola
Parole chiave
  • Emotion Detection
  • Sentiment Analysis
  • Emotive Lexicons
  • Distributional Semantics
  • Computational Linguistics
  • NLP
Data inizio appello
21/01/2017
Consultabilità
Completa
Riassunto
With the proliferation of social media, textual emotion analysis is becoming increasingly important. Sentiment Analysis and Emotion Detection can be useful to track several applications. They can be used, for instance, in Customer Relationship Management to track sentiments towards companies and their services, or in Government Intelligence, to collect people's emotions and points of views about government decisions.
It is clear that tracking reputation and opinions without appropriate text mining tools is simple infeasible. Most of these tools are based on sentiment and emotion lexicons, in which lemmas are associated with the sentiment and/or emotions they evoke. However, almost all languages but English lack high-coverage inventories of this sort.

This thesis presents several sentiment analysis tasks to illustrate challenges and opportunities in this research area. We review different state-of-the-art methods for sentiment analysis and emotion detection and describe how we modeled a framework to build emotive resources, that can be effectively exploited for text affective computing. One of the main outcome of the work presented in this thesis is ItEM, which is a high-coverage Italian EMotive lexicon created by exploiting distributional methods.It has been built with a three stage process including the collection of a set of highly emotive words, their distributional expansion and the validation of the system. Since corpus-based methods reflect the type of the corpus from which they are build, in order to create a reliable lexicon we collected a new Italian corpus, namely FB-NEWS15. This collection has been created by crawling the Facebook pages of the most important Italian newspapers, which typically include a small number of posts written by the journalists and a very high number of comments inspired by long discussions among readers about such news.

Finally, we describe some experiments on the sentiment polarity classification of tweets. We started from a system based on supervised learning that was originally developed for the Evalita 2014 SENTIment POLarity Classification task (Basile et al., 2014) and subsequently explored the possibility to enrich this system by exploiting lexical emotive features derived from social media texts.
File