logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-02102026-113421


Tipo di tesi
Tesi di laurea magistrale
Autore
VELARDITA, MICHELE
URN
etd-02102026-113421
Titolo
MetaDex: extracting metadata with confidence using generative AI.
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Prof. Frangioni, Antonio
Parole chiave
  • Confidence estimation
  • generative AI
  • IA generativa
  • IE
  • Information extraction
  • Large Language Model
  • LLM
  • Uncertainty quantification
Data inizio appello
27/02/2026
Consultabilità
Completa
Riassunto (Inglese)
This thesis project addresses the problem of structured extraction of heterogeneous data from semi-structured documents characterized by multiple formats and templates. The work focuses on document understanding and entity extraction tasks, using generative AI models, both black-box systems and models with access to output logits, combined with optical character recognition (OCR) pipelines. The primary focus is the implementation and experimental validation of uncertainty estimation techniques, with the goal of providing informative confidence metrics to the end user. Different uncertainty measures are evaluated on real-world document datasets. The results highlight the potential of uncertainty-aware extraction systems to improve reliability and interpretability in their outputs.
Riassunto (Italiano)
File