Tesi etd-02062025-185113 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
MATRELLA, ROBERTA
URN
etd-02062025-185113
Titolo
Federated Learning with Tree-Based Models: Client Clustering and Representative Model Extraction
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Marcelloni, Francesco
correlatore Prof. Ducange, Pietro
correlatore Ruffini, Fabrizio
correlatore Prof. Ducange, Pietro
correlatore Ruffini, Fabrizio
Parole chiave
- clustering
- data privacy
- decision trees
- federated learning
- personalization
Data inizio appello
21/02/2025
Consultabilità
Non consultabile
Data di rilascio
21/02/2065
Riassunto
This thesis proposes a reliable approach for clustering participants in a federated learning environment, specifically using decision trees as the learning model to ensure explainability in classification tasks. In Federated Learning, participants train models locally on their data without sharing raw information, yet discrepancies in data distribution pose challenges in constructing an effective global model. Grouping participants based on similar decision patterns can enhance performance by enabling models to be trained on more homogeneous data.
To achieve this, three different measures for assessing the distance between decision trees were tested on the N-BaIoT dataset. The prediction similarity metric evaluates tree distances based on differences in their predictions, while the WSV measures structural differences between trees. Additionally, the cosine distance was used to assess distances based on embeddings extracted from the trees generated using a VAE. These distance metrics were then applied to cluster participants using hierarchical clustering.
Furthermore, the three metrics were also compared as methods for selecting a representative tree, both from the entire federation and within individual clusters. This approach contrasts with traditional aggregation methods, exploring whether selecting a single representative tree could serve as a viable alternative to constructing a fully aggregated model.
To achieve this, three different measures for assessing the distance between decision trees were tested on the N-BaIoT dataset. The prediction similarity metric evaluates tree distances based on differences in their predictions, while the WSV measures structural differences between trees. Additionally, the cosine distance was used to assess distances based on embeddings extracted from the trees generated using a VAE. These distance metrics were then applied to cluster participants using hierarchical clustering.
Furthermore, the three metrics were also compared as methods for selecting a representative tree, both from the entire federation and within individual clusters. This approach contrasts with traditional aggregation methods, exploring whether selecting a single representative tree could serve as a viable alternative to constructing a fully aggregated model.
File
Nome file | Dimensione |
---|---|
La tesi non è consultabile. |