logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-03242017-172141

Thesis type
Tesi di dottorato di ricerca
Thesis title
Algorithms and data structures for big labeled graphs
Academic discipline
Course of study
tutor Prof. Ferragina, Paolo
relatore Prof. Venturini, Rossano
  • entity linking
  • graph compression
  • hashtag
  • language model
  • word embeddings
Graduation session start date
This thesis focuses on the design of efficient (in terms of time and space) and efficacious (in terms of precision and recall) algorithms and data structures that leverage upon the information encoded in big labeled graphs, with the ultimate goal of deploying them in the solution of problems emerging in the realm of Search Engines and Social Networks.

In the first part of the dissertation, we focus on Entity Linking and its applications. The procedure can be thought as a semantic augmentation of documents with entities drawn from a large knowledge base, such as Wikipedia, that are linked to meaningful sequences of terms found in those documents. Our contributions are the following ones: first, we introduce a new and easy to use benchmarking platform that can be deployed to compare entity linkers with minimum effort; second, we propose a scalable disambiguation algorithm that hinges on the theory of word embeddings and language models in order to provide state-of-the-art or near state-of-the-art entity-linking performance at Web-scale; third, we show the application of entity-linking in a noisy and difficult domain like Twitter’s messages and derive a graph-based representation of hashtags’ meaning, which, in turn, can be employed to tackle the challenging problems of hashtag relatedness and hashtag classification.

In the second part of the dissertation, we deal with the problem of compressed representation of knowledge graphs and large labeled graphs in general. We propose a novel and efficient encoding of labeled graphs that need to support sophisticated search operations which involve both the linked structure of the graph and the string content of its nodes.