Tesi etd-01222024-143401

Tipo di tesi

Tesi di laurea magistrale

Autore

PEZZUTI, FRANCESCA

URN

etd-01222024-143401

Titolo

Bilinear similarity learning for bi-encoder neural IR systems

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Tonellotto, Nicola

Parole chiave

bi-encoder
bilinear similarity
information retrieval
neural ir
similarity learning

Data inizio appello

13/02/2024

Consultabilità

Non consultabile

Data di rilascio

13/02/2094

Riassunto

Typically, IR systems, for each received query, compute a ranking of the documents of their collection based on how much each document is able to fulfill the information needs of the user. The ranking is inferred by the similarity score, a real value assigned to each document with respect to each query, computed by means of a similarity function that combines system-specific representations of queries and documents. The choice of the representations, as well as the similarity function, has direct impact on the effectiveness of IR systems.

While classical IR systems, based on inverted indexes, use sparse representations of queries and documents together with carefully hand-crafted similarity functions such as TFIDF and BM25, neural IR systems such as bi-encoder systems, employ machine-learning to learn new dense representations for queries and documents, embedding them both into a common latent vector space where the similarity can be computed simply using the dot product, i.e., the Euclidean inner product. Although the dot product similarity has proven effective with many different neural IR systems, it is still not clear if the learned representation space exhibits a global Euclidean geometry, or a different geometry governed by different global or local non-Euclidean geometries.

In this thesis, we propose to allow neural IR bi-encoder systems to learn, together with the representations, the actual inner product(s) governing the geometry of the corresponding representation space. This is carried out by integrating neural IR bi-encoder systems with a learnable bilinear similarity function that generalizes the dot product similarity and is able to capture semantic text properties along with the similarities.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01222024-143401