logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-10142019-100652

Thesis type
Tesi di dottorato di ricerca
Thesis title
Superdiversity: (Big) Data analytics at the crossroads of geography, language and emotions
Academic discipline
Course of study
tutor Prof. Pedreschi, Dino
  • human migration
  • sentiment analysis
  • superdiversity
Graduation session start date
In a series of articles, Vertovec focused on the changes and contexts that have affected migratory flows around the world. These demographic changes, which Vertovec defines Superdiversity, are the result of the globalisation and they outline a change in the overall level of migration patterns. Over time, the migration routes have increased both their diversity and complexity. The nature of immigration has brought with it a transformative ``diversification of diversity''. Strictly connected with ethnicity and Superdiversity studies, the phenomenon of human migration has been a constant during human history. In the era of Big Data, every single user lives in a hyper-connected world. More than 75\% of the world's population has a mobile phone, and over half of these are smartphones. The use of social media grows together with the number of connected people. In these \emph{social} Big data, User-Generated Content incorporate a high number of discriminating information. Language, space and time are three of the best features that can be employed to detect Superdiversity. The strongest point of social Big Data is that they typically natively include various information about different dimensions.

Starting from these observations, in this thesis, we define a measure of Superdiversity, a Superdiversity Index, by adding the emotional dimension and placing it in the context of social Big Data. Our measure is based on an epidemic spreading algorithm that is able to automatically extend the dictionary used in lexicon-based sentiment analysis. It is easily applicable to various languages and suitable for Big Data. Our Superdiversity Index allows for comparing diversity from the point of view of the emotional content of language in different communities. An important characteristic of our Superdiversity Index is the high correlation with immigration rates.
For this reason, we believe this can be used as an essential feature in a nowcasting model of migration stocks. Our framework can be applied with higher time and space resolution compared to official statistics. Moreover, we apply our method to a different context and data to measure the Superdiversity of the music world.