Tesi etd-06182025-074117

Tipo di tesi

Tesi di laurea magistrale

Autore

FANTASIA, RICCARDO

URN

etd-06182025-074117

Titolo

On Explaining Federated Learning Models for Classification Tasks

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

CYBERSECURITY

Relatori

relatore Prof. Marcelloni, Francesco
supervisore Prof. Hagras, Hani
tutor Dott. Daole, Mattia

Parole chiave

explainability
federated learning
fuzzy models
interpretability
privacy

Data inizio appello

23/07/2025

Consultabilità

Non consultabile

Data di rilascio

23/07/2065

Riassunto

This master's thesis presents a methodical investigation into solving a critical conflict in modern AI: the tension between the need for model transparency and the demand for data privacy. The work is motivated by the fact that while complex models like deep neural networks are powerful, their "black-box" nature makes them unsuitable for high-stakes domains like healthcare and finance where explainability is crucial. Concurrently, Federated Learning (FL) has emerged as the standard for privacy-preserving AI, allowing collaborative model training on decentralized data. The core problem this thesis addresses is that applying FL to interpretable models, such as decision trees, typically results in a complex ensemble of models, which sacrifices the very interpretability that was desired.

To find a viable solution, the research systematically explores and evaluates several different approaches. The first strategy investigated was a Federated Rule Aggregation (FedRA) method. In this approach, each client trains a local Fuzzy Decision Tree (FDT), extracts a set of human-readable IF-THEN rules, and sends them to a central server. The server then consolidates these into a single global rule base. While this method succeeds in producing a single rule-based model, it was found to suffer from a "rule explosion," where the final model contained an unmanageable number of rules (e.g., 500 rules for a task where a centralized model had only 113), severely undermining interpretability. Furthermore, its predictive performance degraded significantly in more realistic non-identically distributed (non-IID) data scenarios.

Given the limitations of rule aggregation, the thesis then establishes a performance baseline by evaluating a standard federated FDT ensemble, where predictions are made by a majority vote among all client models. This approach proved to be predictively strong and robust, especially in non-IID settings. However, this strength came at the cost of immense structural complexity, with ensembles containing up to 805 rules and 1339 total nodes, rendering them completely opaque and effectively functioning as a "black box".

This methodical exploration of preliminary strategies and their shortcomings directly motivates the thesis's main contribution: the Artificial Representative Tree (ART) framework. The ART is a novel algorithm designed to distill the predictive power of the complex but accurate federated ensemble into a single, compact, and highly interpretable FDT. The generation process is guided by a structural similarity metric known as the Weighted Splitting Variables (WSV) distance, which helps build a new tree that best represents the structure of the entire ensemble. Critically, this construction is performed centrally by the server in a data-agnostic manner, without requiring access to private client data. The results show that the ART framework achieves a dramatic reduction in model complexity—often by over 95% in terms of rules and nodes—thereby restoring the interpretability lost in the federated process. The final step of the framework involves a privacy-preserving method to populate the ART's leaves with predictive class distributions, turning the interpretable structure into a fully functional model.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06182025-074117