ETD system

Electronic theses and dissertations repository


Tesi etd-07162007-192228

Thesis type
Tesi di dottorato di ricerca
Falchi, Fabrizio
email address,
A Content-Addressable Network for Similarity Search in Metric Spaces
Settore scientifico disciplinare
Corso di studi
Relatore Lopriore, Lanfranco
Relatore Rabitti, Fausto
Relatore Zezula, Pavel
Parole chiave
  • metric spaces
  • MCAN
  • Grid
  • distributed index
  • DHT
  • Content-Addressable Network
  • CAN
  • similarity search
  • SDDS
  • Peer-to-Peer
Data inizio appello
Riassunto analitico
Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database utilized by people, the computer systems must be able to model human fundamental reasoning paradigms, which are naturally based on similarity. The ability to perceive similarities is crucial for recognition, classification, and learning, and it plays an important role in scientific discovery and creativity. Recently, the mathematical notion of metric space has become a useful abstraction of similarity and many similarity search indexes have been developed.
In this thesis, we accept the metric space similarity paradigm and concentrate on the scalability issues. By exploiting computer networks and applying the Peer-to-Peer communication paradigms, we build a structured network of computers able to process similarity queries in parallel. Since no centralized entities are used, such architectures are fully scalable. Specifically, we propose a Peer-to-Peer system for similarity search in metric spaces called Metric Content-Addressable Network (MCAN) which is an extension of the well known Content-Addressable Network (CAN) used for hash lookup. A prototype implementation of MCAN was tested on real-life datasets of image features, protein symbols, and text — observed results are reported. We also compared the performance of MCAN with three other, recently proposed, distributed data structures for similarity search in metric spaces.