logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-04242018-161334

Thesis type
Tesi di dottorato di ricerca
Thesis title
Enhancing Content-Based Image Retrieval Using Aggregation of Binary Features, Deep Learning, and Supermetric Search
Academic discipline
Course of study
tutor Dott. Amato, Giuseppe
tutor Dott. Falchi, Fabrizio
tutor Prof. Marcelloni, Francesco
  • 4-point property
  • Bernoulli mixture model
  • CBIR
  • Fisher Vector
  • Hilbert Exclusion
  • metric indexing
  • metric search
  • n-Simplex
  • permutation-based indexing
  • similarity search
Graduation session start date
The millions of images shared every day on social media is just a tip of the iceberg of the current phenomenon of visual data explosion, which places a great demand on scalable Content-Based Image Retrieval (CBIR) systems. CBIR allows organizing and searching image collections on the basis of image visual contents, that is without using text or other metadata. The problem of content-based search is addressed in this thesis by investigating and proposing efficient and effective methods that support three fundamental stages of a CBIR system, namely the numerical representation of the image visual content (feature extraction), the processing/indexing of the image features, and the query-by-example search.

Concerning the image representation we investigate and experimentally compare Convolutional Neural Network (CNN) features, methods for aggregating local features, and their combination. We show that very high effectiveness is achieved combining CNN features and aggregation methods; moreover, in order to improve the efficiency we investigate the use of the aggregation methods on the top of binary local features. In particular, we propose the BMM-FV which allows encoding a set of binary vectors into a single descriptor. An extensive experimental evaluation on benchmark datasets shows that our BMM-FV outperforms other methods for aggregating binary local features and achieves high retrieval performance when combined with the CNN features.

Secondly, we propose an efficient and effective technique, called Deep Permutation, to index deep features (such as CNN features) using a permutation-based approach. Moreover, we propose the Blockwise Surrogate Text Representation to represent and index compound metric objects, including the VLAD image descriptors, using off-the-shelf text search engine.

Finally, we address the image search task in the general context of similarity search in metric space, which is a framework suitable for a large number of applications and data types. Most metric indexing and searching mechanisms rely on the triangle inequality, which allows deriving bounds on the distance between data objects. The distance bounds are used to efficiently exclude partition of the data that do not contain solutions to a given query. We reread foundations of metric search from a geometrical point of view starting from the observation that the triangle inequality is equivalent to a discrete geometric condition defined in term of finite isometric embeddings into Euclidean spaces. We show that there exists a large class of metric spaces, the supermetric ones, meeting the four-point property that is a property stronger than the triangle inequality. Moreover, we show that many supermetric spaces commonly used in applications have a further property called n-point property. The main outcome of our study is showing how these geometric properties can be used to improve the similarity search in supermetric spaces by 1) deriving distance bounds that are tighter than that relied on the triangle inequality and, thus, allowing better space pruning; 2) defining novel partitioning and indexing mechanisms; 3) proposing a promising approach to embed a supermetric space into a finite-dimensional Euclidean space, which turns out to have implications not only in the similarity search context but also in other applicative tasks such, as the dimensionality reduction. We prove the validity of our approaches both theoretically and experimentally.