Tesi etd-06012022-053415

Tipo di tesi

Tesi di laurea magistrale

Autore

MAZZONI, FEDERICO

URN

etd-06012022-053415

Titolo

Genetic Fairness-Enhancing Data Generation Framework

Dipartimento

FILOLOGIA, LETTERATURA E LINGUISTICA

Corso di studi

INFORMATICA UMANISTICA

Relatori

relatore Prof. Guidotti, Riccardo
relatore Marchiori, Marta
relatore Cinquini, Martina

Parole chiave

algorithmic bias
data awareness
data balancing
digital discrimination
fairness
framework
genetic alghorithm
ml evaluation
oversampling
preprocessing
synthetic data

Data inizio appello

11/07/2022

Consultabilità

Completa

Riassunto

The fast and recent widespread adoption of machine learning models has made an inherent flaw of the paradigm clear. Since the process is heavenly dependent on the set of data used in the training phase, any bias arising from the training collection is inherited by decision models and propagated through its automatic processes. Several techniques have been proposed to balance the training dataset with respect to sensitive attributes such as ethnicity, gender, age, or religion, aiming at developing a discrimination-free model. This thesis presents FairGen, a framework to improve the dataset’s fairness through Genetic Algorithms.
FairGen extends and improves the fairness-enhancing algorithm Preferential Sampling by generating fair and plausible data to be used as input for training machine learning classification models. We compared FairGen against state-of-the-art pre-processing algorithms and data generation approaches customized with the same ideas for fairness and plausibility implemented by FairGen. Results show that FairGen is able to successfully remove the discrimination in the training dataset, resulting in fairer models than those trained on datasets obtained with state-of-the-art approaches.

File

Nome file	Dimensione
Tesi_Mazzoni.pdf	1.28 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06012022-053415