Thesis etd-01162023-001732 |
Link copiato negli appunti
Thesis type
Tesi di laurea magistrale
Author
RIZZO, SIMONE
URN
etd-01162023-001732
Thesis title
Attacking machine learning models and their global explainers
Department
INFORMATICA
Course of study
INFORMATICA
Supervisors
relatore Prof.ssa Monreale, Anna
correlatore Prof.ssa Naretto, Francesca
correlatore Prof.ssa Naretto, Francesca
Keywords
- adversarial attack
- data privacy
- explainable ai
- global explainer
- machine learning
- privacy
- xai
Graduation session start date
24/02/2023
Availability
Withheld
Release date
24/02/2093
Summary
We designed a new model agnostic attack for membership inference. This attack is able to target black box models without access to label confidence, and it works by estimating the confidence of a model's predictions by perturbing the input and observing the model's robustness. This attack generates a batch of perturbations for a given input point and assigns a robustness score to that point, which represents how confident the model is about that point. A higher robustness score indicates that the point is farther from the decision boundary and was likely part of the training set, resulting in higher confidence in the model's prediction. Conversely, a lower robustness score indicates that the point is closer to the decision boundary and the model's confidence in its prediction is lower. The attack was tested on three datasets (Adult, Bank, and a Synthetic one) and applied to three different models (Decision Tree, Random Forest, and Neural network). The results showed that the attack performed well, and it also showed a relationship between the overfitting of the model and the risk of privacy. Additionally, the attack was used against tree-based global explainers of the models to see if there is a leakage of privacy also from the explainers. The results highlighted that explainers trained on overfitted models have a higher leakage of privacy which represent a threat to privacy. To mitigate this privacy threat we designed also a model selection algorithm for explainers that grant a minimum of 85% of fidelity while protecting privacy.
File
| Nome file | Dimensione |
|---|---|
Thesis not available for consultation. |
|