Tesi etd-03232026-190639

Tipo di tesi

Tesi di laurea magistrale

URN

etd-03232026-190639

Titolo

Un sistema spiegabile basato su linguistica computazionale per l'identificazione di truffe informatiche

Dipartimento

FILOLOGIA, LETTERATURA E LINGUISTICA

Corso di studi

INFORMATICA UMANISTICA

Relatori

relatore Prof.ssa Guidi, Barbara
relatore Dott. Michienzi, Andrea

Parole chiave

anomaly detection
blockchain
embeddings
explainable ai
intelligenza artificiale
LightGBM
linguistica computazionale
machine learning
nft
Random Forest
SHAP
wash trading
XGBoost

Data inizio appello

10/04/2026

Consultabilità

Tesi non consultabile

Riassunto (Inglese)

This thesis proposes an explainable automatic system for detecting wash trading practices in the NFT market, a phenomenon that artificially inflates prices and trading volumes. The approach integrates computational linguistics and machine learning techniques, reinterpreting blockchain transactions as semantic sequences and representing them through embeddings. Supervised classification models, including Random Forest, XGBoost, and LightGBM, are trained on these representations to distinguish between legitimate and suspicious transactions. A key contribution of the work is the use of Explainable AI techniques, particularly SHAP, to interpret model decisions and ensure transparency and reliability. The system is structured as a modular pipeline that includes data collection, embedding construction, model training, and interpretative analysis. The results show that combining semantic representations with ensemble models improves detection accuracy while maintaining a high level of interpretability, which is crucial in sensitive domains such as financial security.

Riassunto (Italiano)

La tesi propone un sistema automatico e spiegabile per l’identificazione di pratiche di wash trading nel mercato degli NFT, fenomeno che altera artificialmente prezzi e volumi di scambio. L’approccio integra tecniche di linguistica computazionale e machine learning, reinterpretando le transazioni blockchain come sequenze semantiche e rappresentandole tramite embeddings. Su tali rappresentazioni vengono addestrati modelli di classificazione supervisionata, tra cui Random Forest, XGBoost e LightGBM, al fine di distinguere tra transazioni legittime e sospette. Un elemento centrale del lavoro è l’uso di tecniche di Explainable AI, in particolare SHAP, per interpretare le decisioni dei modelli e garantire trasparenza e affidabilità. Il sistema è strutturato come una pipeline modulare che comprende raccolta dati, costruzione degli embeddings, addestramento dei modelli e analisi interpretativa. I risultati dimostrano come l’integrazione tra rappresentazioni semantiche e modelli ensemble consenta di migliorare l’accuratezza del rilevamento, mantenendo al contempo un elevato livello di interpretabilità, aspetto cruciale in ambiti sensibili come la sicurezza finanziaria.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03232026-190639