logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09252024-160806


Tipo di tesi
Tesi di laurea magistrale
Autore
ERRIQUEZ, DOMENICO
URN
etd-09252024-160806
Titolo
A Versatile Implementation of HNSW for Efficient Approximate Nearest Neighbor Search
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Venturini, Rossano
relatore Ing. Nardini, Franco Maria
relatore Dott. Rulli, Cosimo
Parole chiave
  • ANN
  • Approximate nearest neighbor search
  • Hierarchical Navigable Small Worlds
  • Hnsw
  • PQ
  • Product Quantization
  • Similarity search
  • Vector embedding
Data inizio appello
11/10/2024
Consultabilità
Completa
Riassunto
In this thesis, I present the implementation of a Hierarchical Navigable Small World (HNSW) graph for approximate nearest neighbor (ANN) search, developed in Rust and integrated into the KANNolo library developed by the Consiglio Nazionale delle Ricerche (CNR) and University of Pisa. This implementation is designed for versatility, supporting both dense and sparse datasets, as well as two different types of quantizers. The primary role of a quantizer is to encode vectors, reducing dimensionality for more efficient computation.
One of the supported quantizers does not encode the vectors, allowing the HNSW index to work with the original data, while the other employs the Product Quantization (PQ) process to efficiently encode the vectors.

The HNSW implementation is adaptable across different datasets and quantizers, as they just need to implement the Dataset and Quantizer traits, respectively. These traits are defined and implemented in the KANNolo library to establish the required functionality. This design ensures future extensibility, allowing the HNSW index to support new datasets or quantizers by simply implementing these traits, making it highly adaptable to evolving requirements in similarity search tasks.
File