Tesi etd-06282024-124447

Tipo di tesi

Tesi di laurea magistrale

Autore

GRILLEA, FRANCESCO

URN

etd-06282024-124447

Titolo

Comparing Reinforcement and Supervised Learning for tuning Large Language-specific Models

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Galatolo, Federico Andrea
relatore Cimino, Mario Giovanni Cosimo Antonio
relatore Cominelli, Lorenzo

Parole chiave

italian large language model
large language models
natural language processing

Data inizio appello

26/07/2024

Consultabilità

Non consultabile

Data di rilascio

26/07/2027

Riassunto

[IT]

Questo studio ha come obiettivo quello di sviluppare un Large Language Model (LLM) in grado di rispondere correttamente in lingua Italiana. Per raggiungere questo obiettivo, è stato utilizzato un processo fine tuning suddiviso in più fasi. La prima fase consiste nell'allineamento linguistico utilizzando separatamente Supervised Fine-Tuning (SFT) e Direct Preference Optimization (DPO), creando due base versions. Ognuna di queste versioni è stata poi sottoposta a tre diverse fasi di allineamento DPO: allineamento del contenuto, allineamento grammaticale e allineamento del contenuto e della grammatica simultaneamente. Questo processo ha portato allo sviluppo di otto versioni del modello.

La fase di selezione del modello consiste nell'identificare la versione più accurata attraverso vari task di benchmark, e anche usando GPT-4 come giudice. Il modello selezionato è stato poi confrontato con altri LLM italiani, tra cui Fauno-7b, Camoscio-7b, LLaMAntino-2-7b e Cerbero-7b.

[EN]

This study addresses the challenge of developing an effective Italian-speaking Large Language Model (LLM). To achieve this goal, a multi-stage fine-tuning process was employed. The first stage involved language alignment using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) separately, creating two base versions. Each base version was then subjected to three different DPO alignment stages: content alignment, grammar alignment, and a combination of both content and grammar alignment. This process resulted in the development of eight versions of the model.

The model selection stage focused on identifying the most accurate version through various benchmark tasks, and also evaluating using GPT-4. The selected model was subsequently compared against other Italian LLMs, including Fauno-7b, Camoscio-7b, LLaMAntino-2-7b and Cerbero-7b.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06282024-124447