logo SBA

ETD

Digital archive of theses discussed at the University of Pisa

 

Thesis etd-03242017-170446


Thesis type
Tesi di dottorato di ricerca
Author
LULLI, ALESSANDRO
URN
etd-03242017-170446
Thesis title
Distributed Graph Processing: Algorithms And Applications
Academic discipline
INF/01
Course of study
INFORMATICA
Supervisors
relatore Ricci, Laura Emilia Maria
relatore Dazzi, Patrizio
Keywords
  • apache spark
  • clustering
  • connected components
  • distributed processing
  • graph
Graduation session start date
29/04/2017
Availability
Full
Summary
Thinking Like A Vertex (TLAV) is a popular computational paradigm suitable to express many distributed and iterative graph algorithms. It has been adopted as base computational paradigm for many of the currently available distributed frameworks and endorsed by numerous industries and academias. Also, it has been exploited to define algorithms to extract useful information from the nowadays increasing production of data which can be modeled as graphs. These facts strengthen the idea that exploiting distributed frameworks for graph analysis is an hot topic of research. As a matter of fact, we found that a solution for several algorithms is not always available or state-of-art algorithms are unsatisfactory, under many points of view.
This thesis aims at providing guidelines for defining distributed graph algorithms structured according to TLAV and showing their applicability to real applications. We show how approximation, simplification and versatility can be combined to define novel distributed algorithms to improve the currently available solutions with the goal to enhance the functionalities of the algorithms. We show also how algorithms may be combined to define complex solutions and how they can be employed to solve relevant applications.
In particular, we present novel algorithms for computing betweenness centrality, connected components and clustering. Such algorithms are exploited for Spam campaign detection, population estimation and hashtag centrality. To this end, we make use of real large dataset provided from our collaborations, Symantec for Spam emails, a large Italian Mobile Phone provider, mobile calls, and ISTI, CNR for a two years collection of real tweets from the Twitter social network.
File