logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09282025-012455


Tipo di tesi
Tesi di laurea magistrale
Autore
DICANDIA, MICHELE
URN
etd-09282025-012455
Titolo
Sviluppo di un assistente virtuale basato su LLM per il supporto all’utente su SoBigData academy
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Prof. Trasarti, Roberto
relatore Prof.ssa Lelli, Sara
Parole chiave
  • benchmark
  • chatbot
  • fine-tuning
  • Large Language Models (LLMs)
  • SoBigData academy
Data inizio appello
17/10/2025
Consultabilità
Non consultabile
Data di rilascio
17/10/2095
Riassunto
This thesis presents the design, implementation, and evaluation of a domain-specific chatbot for the SoBigData Academy, based on open-source Large Language Models (LLMs). The work combines a theoretical overview of LLM architectures and fine-tuning strategies with a data-driven approach, including the creation of a custom dataset of question–answer pairs derived from all Academy courses. A comparative experimental evaluation was carried out on three families of ~3B parameter models (Ministral, Phi-3, Qwen2.5), tested in their base form and after supervised fine-tuning on both training and complete datasets. Metrics such as BLEU, METEOR, BERTScore, and Exact Match were used, alongside a mini grid search with 12 configurations to optimize training setups. Results highlight consistent improvements after fine-tuning, with Qwen2.5-3B emerging as the most robust and generalizable model. The study also identifies critical challenges—limited hardware resources, dataset heterogeneity, and library incompatibilities—and proposes future directions, including integration with APIs, adoption of Retrieval-Augmented Generation (RAG), and multimodal extensions.
File