logo SBA

ETD

Digital archive of theses discussed at the University of Pisa

 

Thesis etd-09042025-154714


Thesis type
Tesi di laurea magistrale
Author
BORGHESI, DANIELE
URN
etd-09042025-154714
Thesis title
Introducing LLM-IMPACT: LLM-Informed Multiagent Platform for Argument Convincingness Testing
Department
INFORMATICA
Course of study
DATA SCIENCE AND BUSINESS INFORMATICS
Supervisors
relatore Prof.ssa Monreale, Anna
supervisore Dott. Cresci, Stefano
Keywords
  • argument convincingness testing
  • automatic counterspeech evaluation
  • computational persuasion
  • large language models
Graduation session start date
17/10/2025
Availability
Withheld
Release date
17/10/2095
Summary
The growing ability of Large Language Models (LLMs) to generate persuasive content has exposed an "evaluation crisis," as methodologies to measure argumentative efficacy are unreliable or unscalable. This thesis introduces LLM-IMPACT, a novel framework that evaluates arguments via simulated, multi-turn debates. Grounded in a unique dataset from the r/ChangeMyView subreddit, the framework's effectiveness was rigorously tested. Our best configuration, a zero-shot Qwen 3 (14B) model guided by an empirically optimized prompt, demonstrated significant reasoning capabilities by achieving a median F-beta macro score of 66.73% and outperforming a majority-class dummy baseline by over 17 percentage points.
The final evaluation on a test set of unseen data confirmed the excellent generalization capabilities of our approach. The top-performing zero-shot model maintained its high performance with remarkable consistency. Furthermore, even the sub-optimal configurations derived from fine-tuning demonstrated highly stable and predictable behavior, successfully generalizing their distinct performance profiles to the test set. This robust generalization across different model configurations validates the methodological soundness of the LLM-IMPACT framework as a reliable tool for persuasion analysis. While fine-tuning did not surpass the zero-shot baseline, our results definitively show that sophisticated prompt engineering can produce stable and highly generalizable models.
File