logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01222024-143401


Tipo di tesi
Tesi di laurea magistrale
Autore
PEZZUTI, FRANCESCA
URN
etd-01222024-143401
Titolo
Bilinear similarity learning for bi-encoder neural IR systems
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Tonellotto, Nicola
Parole chiave
  • information retrieval
  • neural ir
  • bi-encoder
  • bilinear similarity
  • similarity learning
Data inizio appello
13/02/2024
Consultabilità
Non consultabile
Data di rilascio
13/02/2094
Riassunto
Typically, IR systems, for each received query, compute a ranking of the documents of their collection based on how much each document is able to fulfill the information needs of the user. The ranking is inferred by the similarity score, a real value assigned to each document with respect to each query, computed by means of a similarity function that combines system-specific representations of queries and documents. The choice of the representations, as well as the similarity function, has direct impact on the effectiveness of IR systems.

While classical IR systems, based on inverted indexes, use sparse representations of queries and documents together with carefully hand-crafted similarity functions such as TFIDF and BM25, neural IR systems such as bi-encoder systems, employ machine-learning to learn new dense representations for queries and documents, embedding them both into a common latent vector space where the similarity can be computed simply using the dot product, i.e., the Euclidean inner product. Although the dot product similarity has proven effective with many different neural IR systems, it is still not clear if the learned representation space exhibits a global Euclidean geometry, or a different geometry governed by different global or local non-Euclidean geometries.

In this thesis, we propose to allow neural IR bi-encoder systems to learn, together with the representations, the actual inner product(s) governing the geometry of the corresponding representation space. This is carried out by integrating neural IR bi-encoder systems with a learnable bilinear similarity function that generalizes the dot product similarity and is able to capture semantic text properties along with the similarities.
File