logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06232021-133206


Tipo di tesi
Tesi di laurea magistrale
Autore
MARCHIORI MANERBA, MARTA
URN
etd-06232021-133206
Titolo
FairShades - Fairness Auditing in Abusive Language Detection Systems
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Dott. Guidotti, Riccardo
Parole chiave
  • abusive language detection systems
  • XAI
  • NLP
  • fairness in ML
  • algorithmic bias
  • algorithmic auditing
  • digital discrimination
  • hate speech detection
  • intersectionality
Data inizio appello
12/07/2021
Consultabilità
Completa
Riassunto
Current abusive language detection systems have demonstrated unintended bias towards sensitive features such as nationality or gender. This is a crucial issue, which may harm minorities and underrepresented groups if such systems were integrated in real-world applications. In this thesis, we present FairShades, a model-agnostic approach for auditing the outcomes of Abusive Language Detection Systems. Combining Explainability and Fairness evaluation, the tool is able to identify wrong correlations, unintended biases and sensitive categories toward which the models are most discriminative. This objective is pursued through the auditing of meaningful counterfactuals generated by CheckList framework, obtained perturbing sensitive identities present in the texts to be classified. A Decision Tree Regressor is trained on the synthetic neighbourhood and used to simulate and analyse the behaviour, predictions and rationale applied by the black box under consideration. Our approach performs both local and sub-global analysis, combining the individual interpretations. We conduct several experiments on research BERT-based models in order to demonstrate the novelty and effectiveness of our proposal on unmasking biases. Although these classifiers achieve high accuracy levels on a variety of natural language processing tasks, they demonstrate severe shortages on samples involving implicit stereotypes and protected attributes such as nationality or sexual orientation.
File