logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10022025-195809


Tipo di tesi
Tesi di laurea magistrale
Autore
MARIANI, VALERIO
URN
etd-10022025-195809
Titolo
Organizing the news landscape in Encoder Neural Networks’ Latent Spaces.
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Gervasi, Vincenzo
Parole chiave
  • LLM news
Data inizio appello
17/10/2025
Consultabilità
Completa
Riassunto
Information overload in digital news consumption has been observed to produce decreased comprehension, news avoidance behaviors, and reduced civic engagement. To address this problem and potentially reduce the cognitive burden of news consumption, this thesis evaluates the use of different Large Language Models as tools to generate citation-based summaries of news articles. In particular, I develop S0NAR: a semantic news search engine indexing 60,000 articles from Italian and U.S. outlets published between February and May 2025, which retrieves semantically relevant articles using embedding-based search and generates summaries through three leading LLMs: claude-3-7-sonnet, gemini-2.0-flash, and gpt-4o. To evaluate this system, I perform a double-blind user study in which 110 satisfaction ratings are collected and analyzed, revealing claude-3-7-sonnet as the highest-scoring model, followed by gemini-2.0-flash and gpt-4o, with statistically significant differences between the first and third. Feature analysis further reveals model-specific patterns where users prefer shorter summaries when faced with claude-3-7-sonnet responses, while the other models benefit from more extensive and citation-dense coverage. Finally, citation analysis demonstrates substantial reliability differences: while claude-3-7-sonnet and gemini-2.0-flash correctly handle the task in nearly all cases, gpt-4o fails to provide citation-based news summaries in almost half of its responses. These findings establish the viability of LLM-generated news summaries while highlighting that model selection and citation capabilities are crucial factors determining user satisfaction and system effectiveness.
File