logo SBA

ETD

Digital archive of theses discussed at the University of Pisa

 

Thesis etd-02072017-161336


Thesis type
Tesi di laurea magistrale
Author
DE ROSA, PIETRO
URN
etd-02072017-161336
Thesis title
Design and implementation of a distributed system for content-based image retrieval
Department
INGEGNERIA DELL'INFORMAZIONE
Course of study
COMPUTER ENGINEERING
Supervisors
relatore Prof. Gennaro, Claudio
relatore Prof. Amato, Giuseppe
relatore Prof. Falchi, Fabrizio
Keywords
  • cbir system
  • deep features
  • elasticsearch
Graduation session start date
24/02/2017
Availability
Full
Summary
The aim of this work is to design and implement a distributed system for content-based image retrieval on very large image databases. To realize this system, a standard full-text search engine has been used. In particular, the system has been developed with the open source software Elasticsearch which, in turn, is built on top of Apache LuceneTM, a widely used full-text search engine Java library.
In order to allow the full-text search engine to perform similarity search, we used Deep Convolutional Neural Network Features extracted from the images of the dataset and encoded as standard text.
Given the distributed nature of Elasticsearch, the index can be split and spread among several nodes. This makes it easy to parallelize the search, thus leading to a significant performance enhancement.
All the experiments have been conducted on the Yahoo Flickr Creative Commons 100M dataset, publicly available and composed of about 100 million of tagged images. A web-based GUI has been designed to allow the user to perform both textual and visual similarity search on the dataset of images.
File