logo SBA

ETD

Digital archive of theses discussed at the University of Pisa

 

Thesis etd-07072022-115407


Thesis type
Tesi di laurea magistrale
URN
etd-07072022-115407
Thesis title
Feature Reduction and Outlier Detection for Unbalanced Learning
Department
INFORMATICA
Course of study
DATA SCIENCE AND BUSINESS INFORMATICS
Keywords
  • classification framework
  • features projection
  • features selection
  • imbalanced data learning
  • outlier detection
Graduation session start date
22/07/2022
Availability
Full
Abstract (Inglese)
Abstract (Italiano)
In many analysis contexts, training efficient ML models can be complex because of unbalanced data. In cases such as fraud detection, oil spill, rare disease detection and many others, the available data for these uncommon events are limited. Many techniques commonly used in such situations try to rebalance instances belonging to the various classes through removal of majority instances and generation of synthetic or cloned minority ones. Such approaches, however, often achieve unsatisfactory results. In this dissertation, FROID framework is presented, which aims to solve the problem of unbalanced learning through a change of perspective: instead of rebalancing the available data, a feature extraction process is carried out through Outlier Detection and Feature Reduction techniques, to better argue the available instances and allow more accurate hypothesis generation by the models. The effectiveness of FROID is demonstrated through a series of experiments conducted on a large set of benchmark datasets and also on two real case studies.
File