logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03302025-231847


Tipo di tesi
Tesi di laurea magistrale
Autore
BERGAMI, GIOVANNI
URN
etd-03302025-231847
Titolo
Design of method to explain transformers based on similarity-differences and uniqueness for long text classification
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Dott. Parola, Marco
relatore Prof. Sabet Jahromi, Mohammad Naser
Parole chiave
  • ai
  • artificial intelligence
  • bert
  • deep learning
  • explainability
  • interpretability
  • legal dataset
  • long text
  • sidu
  • text classification
  • transformers
  • xai
Data inizio appello
14/04/2025
Consultabilità
Non consultabile
Data di rilascio
14/04/2095
Riassunto
Explainable Artificial Intelligence (XAI) plays a crucial role in making deep learning models more interpretable, particularly in high-stakes domains such as the legal field. This thesis explores the development of a novel XAI method inspired by the SIDU method for long-text classification in legal and sentiment analysis datasets. Specifically, it examines how different classification techniques—such as First-512 tokens, Last-512 tokens, Random-512 tokens, and Random-512 with rationale—affect model interpretability. The study focuses on two transformer-based architectures, BERT and RoBERTa, and introduces novel XAI approaches, including Cosine Similarity Masking (thresholded and ranged) and Persistent Homology Masking (based on angular and Euclidean distances), alongside SHAP as a baseline comparison. A key aspect of this work is the utilization of datasets with annotated rationales, which highlight the most relevant text segments for classification. Both qualitative and quantitative analyses demonstrate that the proposed XAI methods produce explanations that align with human-annotated rationales, indicating that the models' internal representations capture meaningful semantic information.
File