logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-07162007-192228

Thesis type
Tesi di dottorato di ricerca
Falchi, Fabrizio
email address
fabrizio.falchi@isti.cnr.it, fabrizio.falchi@iet.unipi.it
Thesis title
A Content-Addressable Network for Similarity Search in Metric Spaces
Academic discipline
Course of study
Relatore Lopriore, Lanfranco
Relatore Rabitti, Fausto
Relatore Zezula, Pavel
  • CAN
  • Content-Addressable Network
  • DHT
  • distributed index
  • Grid
  • MCAN
  • metric spaces
  • Peer-to-Peer
  • SDDS
  • similarity search
Graduation session start date
Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database utilized by people, the computer systems must be able to model human fundamental reasoning paradigms, which are naturally based on similarity. The ability to perceive similarities is crucial for recognition, classification, and learning, and it plays an important role in scientific discovery and creativity. Recently, the mathematical notion of metric space has become a useful abstraction of similarity and many similarity search indexes have been developed.
In this thesis, we accept the metric space similarity paradigm and concentrate on the scalability issues. By exploiting computer networks and applying the Peer-to-Peer communication paradigms, we build a structured network of computers able to process similarity queries in parallel. Since no centralized entities are used, such architectures are fully scalable. Specifically, we propose a Peer-to-Peer system for similarity search in metric spaces called Metric Content-Addressable Network (MCAN) which is an extension of the well known Content-Addressable Network (CAN) used for hash lookup. A prototype implementation of MCAN was tested on real-life datasets of image features, protein symbols, and text — observed results are reported. We also compared the performance of MCAN with three other, recently proposed, distributed data structures for similarity search in metric spaces.