ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-01302013-170438


Tipo di tesi
Tesi di laurea magistrale
Autore
PICCINNO, FRANCESCO
URN
etd-01302013-170438
Titolo
A framework for the distributed crawling and storage of Twitter, with applications
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA E NETWORKING
Relatori
controrelatore Prof. Attardi, Giuseppe
relatore Prof. Ferragina, Paolo
Parole chiave
  • semantic
  • information extraction
  • data mining
  • information retrieval
Data inizio appello
22/02/2013
Consultabilità
Non consultabile
Data di rilascio
22/02/2053
Riassunto
The thesis consists in the implementation of a modular, distributed and fault tolerant crawler supporting social networks analysis. The volume of data processed in these kind of applications ranges from few gigabytes to several terabytes, so efficient and efficacious algorithms that scale over massive data are required. The system was used to analyze the Twitter Italian community. The obtained dataset (about 1TB) was used to create the so called HE-Graph, a graph connecting hashtags to Wikipedia Entities, that can be used to support several activities (hashtag similarity, hashtag suggestions, faceted browsing, etc).
File