logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01162023-001732


Tipo di tesi
Tesi di laurea magistrale
Autore
RIZZO, SIMONE
URN
etd-01162023-001732
Titolo
Attacking machine learning models and their global explainers
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof.ssa Monreale, Anna
correlatore Prof.ssa Naretto, Francesca
Parole chiave
  • adversarial attack
  • machine learning
  • xai
  • explainable ai
  • privacy
  • data privacy
  • global explainer
Data inizio appello
24/02/2023
Consultabilità
Non consultabile
Data di rilascio
24/02/2093
Riassunto
We designed a new model agnostic attack for membership inference. This attack is able to target black box models without access to label confidence, and it works by estimating the confidence of a model's predictions by perturbing the input and observing the model's robustness. This attack generates a batch of perturbations for a given input point and assigns a robustness score to that point, which represents how confident the model is about that point. A higher robustness score indicates that the point is farther from the decision boundary and was likely part of the training set, resulting in higher confidence in the model's prediction. Conversely, a lower robustness score indicates that the point is closer to the decision boundary and the model's confidence in its prediction is lower. The attack was tested on three datasets (Adult, Bank, and a Synthetic one) and applied to three different models (Decision Tree, Random Forest, and Neural network). The results showed that the attack performed well, and it also showed a relationship between the overfitting of the model and the risk of privacy. Additionally, the attack was used against tree-based global explainers of the models to see if there is a leakage of privacy also from the explainers. The results highlighted that explainers trained on overfitted models have a higher leakage of privacy which represent a threat to privacy. To mitigate this privacy threat we designed also a model selection algorithm for explainers that grant a minimum of 85% of fidelity while protecting privacy.
File