Tesi etd-07072022-115407

Tipo di tesi

Tesi di laurea magistrale

Autore

LUSITO, SALVATORE

URN

etd-07072022-115407

Titolo

Feature Reduction and Outlier Detection for Unbalanced Learning

Dipartimento

INFORMATICA

Corso di studi

DATA SCIENCE AND BUSINESS INFORMATICS

Relatori

relatore Guidotti, Riccardo

Parole chiave

classification framework
features projection
features selection
imbalanced data learning
outlier detection

Data inizio appello

22/07/2022

Consultabilità

Completa

Riassunto

In many analysis contexts, training efficient ML models can be complex because of unbalanced data. In cases such as fraud detection, oil spill, rare disease detection and many others, the available data for these uncommon events are limited. Many techniques commonly used in such situations try to rebalance instances belonging to the various classes through removal of majority instances and generation of synthetic or cloned minority ones. Such approaches, however, often achieve unsatisfactory results. In this dissertation, FROID framework is presented, which aims to solve the problem of unbalanced learning through a change of perspective: instead of rebalancing the available data, a feature extraction process is carried out through Outlier Detection and Feature Reduction techniques, to better argue the available instances and allow more accurate hypothesis generation by the models. The effectiveness of FROID is demonstrated through a series of experiments conducted on a large set of benchmark datasets and also on two real case studies.

File

Nome file	Dimensione
Thesis_Lusito.pdf	2.43 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07072022-115407