| Tesi etd-09142017-230811 | 
    Link copiato negli appunti
  
    Tipo di tesi
  
  
    Tesi di laurea magistrale
  
    Autore
  
  
    BAGHERI AGHABABA, AMIR  
  
    URN
  
  
    etd-09142017-230811
  
    Titolo
  
  
    On discretization of continuous attributes in Big Data mining
  
    Dipartimento
  
  
    INGEGNERIA DELL'INFORMAZIONE
  
    Corso di studi
  
  
    COMPUTER ENGINEERING
  
    Relatori
  
  
    relatore Prof. Marcelloni, Francesco
relatore Prof. Bechini, Alessio
  
relatore Prof. Bechini, Alessio
    Parole chiave
  
  - Apache Spark
- Big Data
- Discretization
- Fuzzy Partitioning
- MapReduce
    Data inizio appello
  
  
    03/10/2017
  
    Consultabilità
  
  
    Non consultabile
  
    Data di rilascio
  
  
    03/10/2087
  
    Riassunto
  
  In the vast domain of data mining with many algorithms and methods, coping with continuous features in data sets is a common issue. Discretization is the process of converting these continuous attributes into discrete intervals. Most of the data mining algorithms expect the attributes to be categorical and/or discrete. And if they can handle continuous attributes, they are having lower accuracies in comparison with those that work with discrete and categorical attributes. Hence, discretization is a very important issue to be addressed. Discretization has also been referred to as a technique for data and noise reduction. There are several methods represented in the field of discretization but most of them are designed to work with small datasets. In this thesis, we have implemented and compared different distributed fuzzy discretizers, namely fuzzy MDLP and fuzzy ur-CAIM, using Map-Reduce programming paradigm and Apache Spark framework. We have analyzed the behavior of these discretizers using distributed fuzzy decision tree with 9 well-known big datasets. These distributed discretizers can be more efficient in handling big data sets. We have also compared the two discretizers using different fuzzy membership functions. The results of the discretizers are analyzed and the reasons behind their behavior are discussed in this thesis.
    File
  
  | Nome file | Dimensione | 
|---|---|
| Tesi non consultabile. | |
 
		