ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-03242017-170446


Tipo di tesi
Tesi di dottorato di ricerca
Autore
LULLI, ALESSANDRO
URN
etd-03242017-170446
Titolo
Distributed Graph Processing: Algorithms And Applications
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
relatore Ricci, Laura Emilia Maria
relatore Dazzi, Patrizio
Parole chiave
  • distributed processing
  • clustering
  • graph
  • apache spark
  • connected components
Data inizio appello
29/04/2017
Consultabilità
Completa
Riassunto
Thinking Like A Vertex (TLAV) is a popular computational paradigm suitable to express many distributed and iterative graph algorithms. It has been adopted as base computational paradigm for many of the currently available distributed frameworks and endorsed by numerous industries and academias. Also, it has been exploited to define algorithms to extract useful information from the nowadays increasing production of data which can be modeled as graphs. These facts strengthen the idea that exploiting distributed frameworks for graph analysis is an hot topic of research. As a matter of fact, we found that a solution for several algorithms is not always available or state-of-art algorithms are unsatisfactory, under many points of view.
This thesis aims at providing guidelines for defining distributed graph algorithms structured according to TLAV and showing their applicability to real applications. We show how approximation, simplification and versatility can be combined to define novel distributed algorithms to improve the currently available solutions with the goal to enhance the functionalities of the algorithms. We show also how algorithms may be combined to define complex solutions and how they can be employed to solve relevant applications.
In particular, we present novel algorithms for computing betweenness centrality, connected components and clustering. Such algorithms are exploited for Spam campaign detection, population estimation and hashtag centrality. To this end, we make use of real large dataset provided from our collaborations, Symantec for Spam emails, a large Italian Mobile Phone provider, mobile calls, and ISTI, CNR for a two years collection of real tweets from the Twitter social network.
File