logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01202025-172958


Tipo di tesi
Tesi di laurea magistrale
Autore
BRUNI, DAVIDE
URN
etd-01202025-172958
Titolo
Improving RAG-based Question Answering systems with semi-structured documents
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Avvenuti, Marco
relatore Prof. Tonellotto, Nicola
relatore Dott. Tesconi, Maurizio
Parole chiave
  • benchmark
  • large language models
  • question-answering
  • RAG
  • retrieval augmented generation
  • semi-structured documents
Data inizio appello
21/02/2025
Consultabilità
Non consultabile
Data di rilascio
21/02/2028
Riassunto
The main objective of this thesis is to explore the state of the art to develop a system capable of handling structured and unstructured data, answering in seconds with accuracy and without “hallucinations”. In this context, a specific benchmark was created for the evaluation of Question Answering tasks, combining text, metadata and qualitative language feature analysis. In addition, new architectures based on Retrieval-Augmented Generation (RAG) were proposed to improve performance on the benchmark.
The results obtained showed that the current state of the art offers solid performance even in complex scenarios, but new approaches have also been identified that are capable of improving on the initial state-of-the-art baseline. However it was highlighted that the choice of the best solution depends on the application context. This work could be used as a starting point for new studies, such as developing a multi-modal system, exploring the potential of a hybrid RAG and fine-tuning approach, or designing new architectures that further improve natural language understanding and generation capabilities.
File