ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-03242017-172141


Tipo di tesi
Tesi di dottorato di ricerca
Autore
PICCINNO, FRANCESCO
URN
etd-03242017-172141
Titolo
Algorithms and data structures for big labeled graphs
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
tutor Prof. Ferragina, Paolo
relatore Prof. Venturini, Rossano
Parole chiave
  • graph compression
  • entity linking
  • hashtag
  • word embeddings
  • language model
Data inizio appello
29/04/2017
Consultabilità
Completa
Riassunto
This thesis focuses on the design of efficient (in terms of time and space) and efficacious (in terms of precision and recall) algorithms and data structures that leverage upon the information encoded in big labeled graphs, with the ultimate goal of deploying them in the solution of problems emerging in the realm of Search Engines and Social Networks.

In the first part of the dissertation, we focus on Entity Linking and its applications. The procedure can be thought as a semantic augmentation of documents with entities drawn from a large knowledge base, such as Wikipedia, that are linked to meaningful sequences of terms found in those documents. Our contributions are the following ones: first, we introduce a new and easy to use benchmarking platform that can be deployed to compare entity linkers with minimum effort; second, we propose a scalable disambiguation algorithm that hinges on the theory of word embeddings and language models in order to provide state-of-the-art or near state-of-the-art entity-linking performance at Web-scale; third, we show the application of entity-linking in a noisy and difficult domain like Twitter’s messages and derive a graph-based representation of hashtags’ meaning, which, in turn, can be employed to tackle the challenging problems of hashtag relatedness and hashtag classification.

In the second part of the dissertation, we deal with the problem of compressed representation of knowledge graphs and large labeled graphs in general. We propose a novel and efficient encoding of labeled graphs that need to support sophisticated search operations which involve both the linked structure of the graph and the string content of its nodes.
File