logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-05082025-165343


Tipo di tesi
Tesi di laurea magistrale
Autore
CODA-GIORGIO, LUCA
URN
etd-05082025-165343
Titolo
SIMVALE: A Generalizable Simulator Validation Framework Combining Embedding Clustering and Feature-Based Metrics
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Prof.ssa Pollacci, Laura
relatore Dott. Fidone, Giacomo
Parole chiave
  • BERT
  • clustering
  • embeddings
  • fine-tuning
  • framework
  • metrics
  • moderation
  • simulator
  • toxicity
  • validation
Data inizio appello
30/05/2025
Consultabilità
Completa
Riassunto
The rise of Large Language Model (LLM)-powered simulators has enabled highly realistic modeling of complex social phenomena, significantly reducing the costs and efforts associated with real-world data collection. However, their reliability remains a persistent challenge, and existing validation approaches offer only partial generalization, typically evaluating simulator realism along isolated dimensions and at fixed levels of granularity. To this end, this thesis introduces SIMVALE (SIMulator VAlidation with Latent Embeddings) — a generalizable multi-dimensional validation framework enabling quantitative assessment of LLM-based simulators. SIMVALE leverages latent embedding clustering and yields both global and local feature-based metrics to evaluate how closely a simulator reflects the reality it aims to model, ultimately supporting its improvement. Furthermore, SIMVALE is tested on a real case study to assess the simulator’s ability to replicate general behavioral patterns, toxic profiles, and the effects of moderation interventions.
File