ETD system

Electronic theses and dissertations repository

 

Tesi etd-01302013-170438


Thesis type
Tesi di laurea magistrale
Author
PICCINNO, FRANCESCO
URN
etd-01302013-170438
Title
A framework for the distributed crawling and storage of Twitter, with applications
Struttura
INFORMATICA
Corso di studi
INFORMATICA E NETWORKING
Supervisors
controrelatore Prof. Attardi, Giuseppe
relatore Prof. Ferragina, Paolo
Parole chiave
  • semantic
  • information extraction
  • data mining
  • information retrieval
Data inizio appello
22/02/2013;
Consultabilità
Parziale
Data di rilascio
22/02/2053
Riassunto analitico
The thesis consists in the implementation of a modular, distributed and fault tolerant crawler supporting social networks analysis. The volume of data processed in these kind of applications ranges from few gigabytes to several terabytes, so efficient and efficacious algorithms that scale over massive data are required. The system was used to analyze the Twitter Italian community. The obtained dataset (about 1TB) was used to create the so called HE-Graph, a graph connecting hashtags to Wikipedia Entities, that can be used to support several activities (hashtag similarity, hashtag suggestions, faceted browsing, etc).
File