logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09042025-154714


Tipo di tesi
Tesi di laurea magistrale
Autore
BORGHESI, DANIELE
URN
etd-09042025-154714
Titolo
Introducing LLM-IMPACT: LLM-Informed Multiagent Platform for Argument Convincingness Testing
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Prof.ssa Monreale, Anna
supervisore Dott. Cresci, Stefano
Parole chiave
  • argument convincingness testing
  • automatic counterspeech evaluation
  • computational persuasion
  • large language models
Data inizio appello
17/10/2025
Consultabilità
Non consultabile
Data di rilascio
17/10/2095
Riassunto
The growing ability of Large Language Models (LLMs) to generate persuasive content has exposed an "evaluation crisis," as methodologies to measure argumentative efficacy are unreliable or unscalable. This thesis introduces LLM-IMPACT, a novel framework that evaluates arguments via simulated, multi-turn debates. Grounded in a unique dataset from the r/ChangeMyView subreddit, the framework's effectiveness was rigorously tested. Our best configuration, a zero-shot Qwen 3 (14B) model guided by an empirically optimized prompt, demonstrated significant reasoning capabilities by achieving a median F-beta macro score of 66.73% and outperforming a majority-class dummy baseline by over 17 percentage points.

The final evaluation on a test set of unseen data confirmed the excellent generalization capabilities of our approach. The top-performing zero-shot model maintained its high performance with remarkable consistency. Furthermore, even the sub-optimal configurations derived from fine-tuning demonstrated highly stable and predictable behavior, successfully generalizing their distinct performance profiles to the test set. This robust generalization across different model configurations validates the methodological soundness of the LLM-IMPACT framework as a reliable tool for persuasion analysis. While fine-tuning did not surpass the zero-shot baseline, our results definitively show that sophisticated prompt engineering can produce stable and highly generalizable models.
File