Tesi etd-09142017-230811

Tipo di tesi

Tesi di laurea magistrale

Autore

BAGHERI AGHABABA, AMIR

URN

etd-09142017-230811

Titolo

On discretization of continuous attributes in Big Data mining

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

COMPUTER ENGINEERING

Relatori

relatore Prof. Marcelloni, Francesco
relatore Prof. Bechini, Alessio

Parole chiave

Apache Spark
Big Data
Discretization
Fuzzy Partitioning
MapReduce

Data inizio appello

03/10/2017

Consultabilità

Non consultabile

Data di rilascio

03/10/2087

Riassunto

In the vast domain of data mining with many algorithms and methods, coping with continuous features in data sets is a common issue. Discretization is the process of converting these continuous attributes into discrete intervals. Most of the data mining algorithms expect the attributes to be categorical and/or discrete. And if they can handle continuous attributes, they are having lower accuracies in comparison with those that work with discrete and categorical attributes. Hence, discretization is a very important issue to be addressed. Discretization has also been referred to as a technique for data and noise reduction. There are several methods represented in the field of discretization but most of them are designed to work with small datasets. In this thesis, we have implemented and compared different distributed fuzzy discretizers, namely fuzzy MDLP and fuzzy ur-CAIM, using Map-Reduce programming paradigm and Apache Spark framework. We have analyzed the behavior of these discretizers using distributed fuzzy decision tree with 9 well-known big datasets. These distributed discretizers can be more efficient in handling big data sets. We have also compared the two discretizers using different fuzzy membership functions. The results of the discretizers are analyzed and the reasons behind their behavior are discussed in this thesis.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09142017-230811