Thesis etd-09142017-230811 |
Link copiato negli appunti
Thesis type
Tesi di laurea magistrale
Author
BAGHERI AGHABABA, AMIR
URN
etd-09142017-230811
Thesis title
On discretization of continuous attributes in Big Data mining
Department
INGEGNERIA DELL'INFORMAZIONE
Course of study
COMPUTER ENGINEERING
Supervisors
relatore Prof. Marcelloni, Francesco
relatore Prof. Bechini, Alessio
relatore Prof. Bechini, Alessio
Keywords
- Apache Spark
- Big Data
- Discretization
- Fuzzy Partitioning
- MapReduce
Graduation session start date
03/10/2017
Availability
Withheld
Release date
03/10/2087
Summary
In the vast domain of data mining with many algorithms and methods, coping with continuous features in data sets is a common issue. Discretization is the process of converting these continuous attributes into discrete intervals. Most of the data mining algorithms expect the attributes to be categorical and/or discrete. And if they can handle continuous attributes, they are having lower accuracies in comparison with those that work with discrete and categorical attributes. Hence, discretization is a very important issue to be addressed. Discretization has also been referred to as a technique for data and noise reduction. There are several methods represented in the field of discretization but most of them are designed to work with small datasets. In this thesis, we have implemented and compared different distributed fuzzy discretizers, namely fuzzy MDLP and fuzzy ur-CAIM, using Map-Reduce programming paradigm and Apache Spark framework. We have analyzed the behavior of these discretizers using distributed fuzzy decision tree with 9 well-known big datasets. These distributed discretizers can be more efficient in handling big data sets. We have also compared the two discretizers using different fuzzy membership functions. The results of the discretizers are analyzed and the reasons behind their behavior are discussed in this thesis.
File
Nome file | Dimensione |
---|---|
Thesis not available for consultation. |