ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-04082015-110906


Tipo di tesi
Tesi di laurea magistrale
Autore
LANDOLFI, LORENZO
URN
etd-04082015-110906
Titolo
Compressing dictionaries of strings
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA E NETWORKING
Relatori
relatore Venturini, Rossano
Parole chiave
  • dictionaries strings locality preserving front cod
Data inizio appello
29/04/2015
Consultabilità
Completa
Riassunto
The aim of this work is to develop a data structure capable of storing a set of strings in a compressed way providing the facility to access and search by prefix any string in the set. The notion of string will be formally exposed in this work, but it is enough to think a string as a stream of characters or a variable length dat}. We will prove that the data structure devised in our work will be able to search prefixes of the stored strings in a very efficient way, hence giving a performant solution to one of the most discussed problem of our age.
In the discussion of our data structure, particular emphasis will be given to both space and time efficiency and a tradeoff between these two will be constantly searched.
To understand how much string based data structures are important, think about modern search engines and social networks; they must store and process continuously immense streams of data which are mainly strings, while
the output of such processed data must be available in few milliseconds not to try the patience of the user.
Space efficiency is one of the main concern in this kind of problem. In order to satisfy real-time latency bounds, the largest possible amount of data must be stored in the highest levels of the memory hierarchy.
Moreover, data compression allows to save money because it reduces the amount of physical memory needed to store abstract data and this particularly important since storage is the main source of expenditure in modern systems.
File