ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-06232016-162204


Tipo di tesi
Tesi di laurea magistrale
Autore
MANNARA, GIUSEPPE
URN
etd-06232016-162204
Titolo
Multi-Objective Evolutionary Optimization of Type-2-Fuzzy-Rule-Based System for Big Data Classification
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
COMPUTER ENGINEERING
Relatori
relatore Prof. Marcelloni, Francesco
correlatore Bechini, Alessio
Parole chiave
  • Fuzzy Rule-Based Classifier
  • Data Mining
  • Big Data
  • Apache Spark
  • Multi-Objective Evolutionary Algorithm
Data inizio appello
22/07/2016
Consultabilità
Non consultabile
Data di rilascio
22/07/2086
Riassunto
Fuzzy rule-based classifiers (FRBCs) have been widely exploited in several engineering applications, such as intrusion detection, thermography-based breast cancer diagnosis, cardiac arrhythmia classification, and ground vehicles classification from acoustic features. This success is mainly due to the intrinsic nature of FRBCs, which allows them to deal with vague and noisy data and to explain how the classification task is performed.
In the last years, multi-objective evolutionary algorithms (MOEAs) have been extensively used to generate FRBCs with different trade-offs between accuracy and interpretability. Although this approach has proven to be very effective, it suffers from a drawback: the evaluation of the accuracy of each individual requires the scan of the overall dataset and in an execution of an MOEA thousands of individuals are typically generated. Thus, if the size of the dataset is very large, the use of this approach becomes very critical both from a computational and storing point of view.
Recently, however, some open source cluster computing frameworks, such as Hadoop and Spark, have been proposed and have gained popularity in short time. These frameworks allow distributing both computation and data over a computer cluster.
In this thesis, we intend to develop an MOEA for generating FRBCs from Big Data on Apache Spark. To this aim, we exploit PAES-RCS, an MOEA-based approach to learn concurrently the rule and data bases of FRBCs. In PAES-RCS, the rule bases are generated by exploiting a rule and condition selection (RCS) strategy, which selects a reduced number of rules from a heuristically generated set of candidate rules and a reduced number of conditions for each selected rule during the evolutionary process. RCS can be considered as a rule learning in a constrained search space. As regards the data base learning, the membership function parameters of each linguistic term used in the rules are learned concurrently to the application of RCS. We have implemented both the original version of PAES-RCS and three extensions. First, we manage interval type-2 fuzzy sets besides type-1 fuzzy sets. Further, we learn the granularities of the fuzzy partitions during the evolutionary process besides fixing them at the beginning. Finally, we adopt a fuzzy decision tree for generating the candidate rules besides a classical decision tree.
We have tested the different possible combinations of the extensions and the original PAES-RCS on twenty-seven small and large datasets and on six big datasets. On these datasets, we have also discussed the scalability of our implementation on Spark.
File