logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07072022-115407


Tipo di tesi
Tesi di laurea magistrale
Autore
LUSITO, SALVATORE
URN
etd-07072022-115407
Titolo
Feature Reduction and Outlier Detection for Unbalanced Learning
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Guidotti, Riccardo
Parole chiave
  • classification framework
  • features projection
  • features selection
  • imbalanced data learning
  • outlier detection
Data inizio appello
22/07/2022
Consultabilità
Non consultabile
Data di rilascio
22/07/2025
Riassunto
In many analysis contexts, training efficient ML models can be complex because of
unbalanced data. In cases such as fraud detection, oil spill, rare disease detection
and many others, the available data for these uncommon events are limited. Many
techniques commonly used in such situations try to rebalance instances belonging to
the various classes through removal of majority instances and generation of synthetic
or cloned minority ones. Such approaches, however, often achieve unsatisfactory re-
sults. In this dissertation, FROID framework is presented, which aims to solve the
problem of unbalanced learning through a change of perspective: instead of rebal-
ancing the available data, a feature extraction process is carried out through Outlier
Detection and Feature Reduction techniques, to better argue the available instances
and allow more accurate hypothesis generation by the models. The effectiveness of
FROID is demonstrated through a series of experiments conducted on a large set of
benchmark datasets and also on two real case studies.
File