logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-04182024-175520


Tipo di tesi
Tesi di dottorato di ricerca
Autore
TOSONI, FRANCESCO
URN
etd-04182024-175520
Titolo
Computation-friendly compression of matrices and tries
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
tutor Prof. Ferragina, Paolo
correlatore Prof. Manzini, Giovanni
Parole chiave
  • basi dati chiave-valore
  • compressione dati ripetitivi
  • compressione senza perdita
  • dizionari di stringhe
  • green computing
  • key-value stores
  • lossless compression
  • matrix-vector multiplications
  • moltiplicazioni matrice-vettore
  • repetitive data compression
  • string dictionaries
  • trie
  • tries
Data inizio appello
06/05/2024
Consultabilità
Non consultabile
Data di rilascio
06/05/2027
Riassunto
In this thesis, we continue the research on repetitive data compression by investigating novel general compression schemes that are data-independent. Although we specifically focus on machine learning and key-value systems, we believe that our methods provide insights applicable to a wider range of application domains.
Our proposed methods adapt one-dimensional general-purpose compression tools to handle complex data structures such as matrices, graphs and tries. These schemes effectively capture redundancies and interdependencies among the data, enabling compression that surpasses what can be achieved through sparsity alone, and without compromising the quality metrics such as precision or recall of the resulting models. Following the “computation-friendly” paradigm, our compressed representations allow for direct operations on the compressed data, with time comparable to operations on uncompressed data.
File