ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-02072017-161336


Tipo di tesi
Tesi di laurea magistrale
Autore
DE ROSA, PIETRO
URN
etd-02072017-161336
Titolo
Design and implementation of a distributed system for content-based image retrieval
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
COMPUTER ENGINEERING
Relatori
relatore Prof. Gennaro, Claudio
relatore Prof. Amato, Giuseppe
relatore Prof. Falchi, Fabrizio
Parole chiave
  • deep features
  • elasticsearch
  • cbir system
Data inizio appello
24/02/2017
Consultabilità
Completa
Riassunto
The aim of this work is to design and implement a distributed system for content-based image retrieval on very large image databases. To realize this system, a standard full-text search engine has been used. In particular, the system has been developed with the open source software Elasticsearch which, in turn, is built on top of Apache LuceneTM, a widely used full-text search engine Java library.
In order to allow the full-text search engine to perform similarity search, we used Deep Convolutional Neural Network Features extracted from the images of the dataset and encoded as standard text.
Given the distributed nature of Elasticsearch, the index can be split and spread among several nodes. This makes it easy to parallelize the search, thus leading to a significant performance enhancement.
All the experiments have been conducted on the Yahoo Flickr Creative Commons 100M dataset, publicly available and composed of about 100 million of tagged images. A web-based GUI has been designed to allow the user to perform both textual and visual similarity search on the dataset of images.
File