ETD system

Electronic theses and dissertations repository


Tesi etd-10112018-134117

Thesis type
Tesi di dottorato di ricerca
Big Data Analytics for Nowcasting and Forecasting Social Phenomena
Settore scientifico disciplinare
Corso di studi
tutor Prof. Pedreschi, Dino
correlatore Dott. Rinzivillo, Salvatore
Parole chiave
  • Nowcasting
  • Big Data Analytics
  • Forecasting
Data inizio appello
Data di rilascio
Riassunto analitico
One of the most pressing, and fascinating challenges of our time is understanding the complexity<br>of the global interconnected society we inhabit. This connectedness reveals in many<br>phenomena: in the rapid growth of the Internet and Web, in the ease with which global communication<br>and trade now takes place, and in the ability of news and information as well as<br>epidemics, trends, financial crises and social unrest to spread around the world with surprising<br>speed and intensity. Ours is also a time of opportunity to observe and measure how our society<br>intimately works: Big Data originating from the digital breadcrumbs of human activities promise<br>to let us scrutinize the ground truth of individual and collective behavior at an unprecedented<br>detail in real time. Multiple dimensions of our social life have Big Data proxies nowadays. We<br>can use Big Data, as signals, as proxies for forecast and nowcast different phenomena, and even<br>more social phenomena. We can manage to describe and predict how humans and society works.<br>We can use geolocated data to observe and measure the behavior of a population, to build better<br>cities tailored to the movement of the population, with lower commuting times and lower<br>pollution. We can exploit medical data to build classifiers able to help in diagnosing and curing<br>diseases. We can use industrial data to improve the production processes, and create smarter<br>and more secure factories. We can do a lot of other incredible and useful things with the support<br>of data and analytical tools able to extract useful knowledge from raw data.<br>In this thesis we introduce data-driven as well as model-driven approaches to predict different<br>phenomena, from epidemics to socio-economic attraction. We use Big Data deriving from our<br>everyday life as external proxies to nowcast and forecast the evolution of phenomena whose study<br>relies only on historical data or data that come only with a significant lag. We use supermarket<br>retail data as an external signal in order to predict the curve of an internal time series, the<br>influenza one. When the flu season arrives, people are starting to get sick. Getting sick affects<br>their everyday life and behavior. This change in behavior should propagate in their purchases<br>in the supermarket. So they will buy products that will reflect the fact that they are sick.<br>We also study human movements that are inherently massive, dynamical, and complex. But<br>understanding the individual mobility patterns, could be of such a fundamental importance for<br>so many different phenomena. We decided to exploit these patterns in order to study and predict<br>the attraction of different socio-economic factors of human environment. In our first approach<br>we study the distribution of the travelling sub-populations in Tuscany region in Italy, to the<br>airports of the region and we built a dynamic model for the interplay of attraction of availability<br>of air travel and an airport’s popularity among the population. Based on this model, we forecast<br>the future evolution of the airports in the region. In our second approach, we identifiy and<br>categorize industrial clusters in Veneto region in Italy, by size and population dynamics and<br>measured their attraction. We create a real-time system which help us to feel the pulse of a city,<br>and predict the rise of new industrial clusters or the death of existing ones. Finally, we attempt<br>prediction in social networks, introducing the interaction prediction problem, trying to predict<br>intra-community interactions, interactions that may occur in the interior of the same community,<br>and we applied the same approach to predict inter-community interactions, the weak links that<br>keep together the modular structure composing complex networks.