logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10112018-134117


Tipo di tesi
Tesi di dottorato di ricerca
Autore
MILIOU, IOANNA
URN
etd-10112018-134117
Titolo
Big Data Analytics for Nowcasting and Forecasting Social Phenomena
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
tutor Prof. Pedreschi, Dino
correlatore Dott. Rinzivillo, Salvatore
Parole chiave
  • Big Data Analytics
  • Forecasting
  • Nowcasting
Data inizio appello
22/10/2018
Consultabilità
Completa
Riassunto
One of the most pressing, and fascinating challenges of our time is understanding the complexity of the global interconnected society we inhabit. This connectedness reveals in many phenomena: in the rapid growth of the Internet and Web, in the ease with which global communication and trade now takes place, and in the ability of news and information as well as epidemics, trends, financial crises and social unrest to spread around the world with surprising speed and intensity. Ours is also a time of opportunity to observe and measure how our society intimately works: Big Data originating from the digital breadcrumbs of human activities promise to let us scrutinize the ground truth of individual and collective behavior at an unprecedented detail in real time. Multiple dimensions of our social life have Big Data proxies nowadays. We can use Big Data, as signals, as proxies for forecast and nowcast different phenomena, and even more social phenomena. We can manage to describe and predict how humans and society works.
We can use geolocated data to observe and measure the behavior of a population, to build better cities tailored to the movement of the population, with lower commuting times and lower pollution. We can exploit medical data to build classifiers able to help in diagnosing and curing diseases. We can use industrial data to improve the production processes, and create smarter and more secure factories. We can do a lot of other incredible and useful things with the support of data and analytical tools able to extract useful knowledge from raw data.
In this thesis we introduce data-driven as well as model-driven approaches to predict different phenomena, from epidemics to socio-economic attraction. We use Big Data deriving from our everyday life as external proxies to nowcast and forecast the evolution of phenomena whose study relies only on historical data or data that come only with a significant lag. We use supermarket retail data as an external signal in order to predict the curve of an internal time series, the influenza one. When the flu season arrives, people are starting to get sick. Getting sick affects their everyday life and behavior. This change in behavior should propagate in their purchases in the supermarket. So they will buy products that will reflect the fact that they are sick.
We also study human movements that are inherently massive, dynamical, and complex. But understanding the individual mobility patterns, could be of such a fundamental importance for so many different phenomena. We decided to exploit these patterns in order to study and predict the attraction of different socio-economic factors of human environment. In our first approach we study the distribution of the travelling sub-populations in Tuscany region in Italy, to the airports of the region and we built a dynamic model for the interplay of attraction of availability of air travel and an airport’s popularity among the population. Based on this model, we forecast the future evolution of the airports in the region. In our second approach, we identifiy and categorize industrial clusters in Veneto region in Italy, by size and population dynamics and measured their attraction. We create a real-time system which help us to feel the pulse of a city, and predict the rise of new industrial clusters or the death of existing ones. Finally, we attempt prediction in social networks, introducing the interaction prediction problem, trying to predict intra-community interactions, interactions that may occur in the interior of the same community, and we applied the same approach to predict inter-community interactions, the weak links that keep together the modular structure composing complex networks.
File