logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03302023-095933


Tipo di tesi
Tesi di laurea magistrale
Autore
ROSSI, ELEONORA
URN
etd-03302023-095933
Titolo
SafeGen: A Data-Anonymization Fairness-Enhancing Framework
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Guidotti, Riccardo
relatore Dott.ssa Pratesi, Francesca
relatore Dott. Mazzoni, Federico
Parole chiave
  • synthetic data
  • fairness
  • discrimination
  • preprocessing
  • k-anonymity
  • privacy
  • privacy risk
  • bias
  • framework
  • generalization
  • suppression
  • genetic algorithm
Data inizio appello
13/04/2023
Consultabilità
Completa
Riassunto
With the increasing use of machine learning systems in decision-making processes based on user-provided data, privacy, and data governance must be considered alongside the avoidance of unfair bias. However, existing systems for bias and privacy mitigation may not always be compatible. For example, masking and generalizing rare information to protect privacy could potentially increase discrimination in the dataset, and these techniques could also lead to incorrect predictions made by machine learning models. To address these challenges, this thesis introduces SafeGen, an algorithm that improves the dataset's privacy through the use of privacy techniques such as generating synthetic records and suppressing and generalizing rare information, without compromising accuracy. Additionally, SafeGen enhances fairness by creating new synthetic data to replace discriminatory records.
As a result, according to the comparison with state-of-art competitors, SafeGen is able to perform better in mitigating privacy risks while maintaining accuracy and fairness in decision-making processes.
File