logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06012022-053415


Tipo di tesi
Tesi di laurea magistrale
Autore
MAZZONI, FEDERICO
URN
etd-06012022-053415
Titolo
Genetic Fairness-Enhancing Data Generation Framework
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Guidotti, Riccardo
relatore Marchiori, Marta
relatore Cinquini, Martina
Parole chiave
  • algorithmic bias
  • framework
  • oversampling
  • genetic alghorithm
  • ml evaluation
  • data awareness
  • digital discrimination
  • data balancing
  • preprocessing
  • fairness
  • synthetic data
Data inizio appello
11/07/2022
Consultabilità
Completa
Riassunto
The fast and recent widespread adoption of machine learning models has made an inherent flaw of the paradigm clear. Since the process is heavenly dependent on the set of data used in the training phase, any bias arising from the training collection is inherited by decision models and propagated through its automatic processes. Several techniques have been proposed to balance the training dataset with respect to sensitive attributes such as ethnicity, gender, age, or religion, aiming at developing a discrimination-free model. This thesis presents FairGen, a framework to improve the dataset’s fairness through Genetic Algorithms.
FairGen extends and improves the fairness-enhancing algorithm Preferential Sampling by generating fair and plausible data to be used as input for training machine learning classification models. We compared FairGen against state-of-the-art pre-processing algorithms and data generation approaches customized with the same ideas for fairness and plausibility implemented by FairGen. Results show that FairGen is able to successfully remove the discrimination in the training dataset, resulting in fairer models than those trained on datasets obtained with state-of-the-art approaches.
File