logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09202016-104509


Tipo di tesi
Tesi di laurea magistrale
Autore
SPEDIACCI, FABIO
URN
etd-09202016-104509
Titolo
A study of relatedness measures in Wikipedia
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Ferragina, Paolo
Parole chiave
  • data structures
  • compression
Data inizio appello
07/10/2016
Consultabilità
Completa
Riassunto
In this thesis I will study relatedness measures between pair of nodes (pages) in the graph of Wikipedia. The literature contains many of such measures, but very often they are not time/space efficient and/or they don't allow to compare any pair of nodes.
Here I will propose new measures that overcome those limitations. They will be constructed over the clustering produced by an algorithm called "Layered Label Propagation" and I will show that they allow to reach comparable (or better) results with some of the best known ones by using, under certain conditions, less memory.
I will also present how, by using an algorithm called "Locality Sensitive Hashing", a subset of them could be used to solve some common search problems in a time/space efficient way. And I will also extend the work done in the paper "Compressed Indexes for String Searching in Labeled Graphs" by using them to solve a similar problem.
File