Tesi etd-06232016-162204

Tipo di tesi

Tesi di laurea magistrale

Autore

MANNARA, GIUSEPPE

URN

etd-06232016-162204

Titolo

Multi-Objective Evolutionary Optimization of Type-2-Fuzzy-Rule-Based System for Big Data Classification

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

COMPUTER ENGINEERING

Relatori

relatore Prof. Marcelloni, Francesco
correlatore Bechini, Alessio

Parole chiave

Fuzzy Rule-Based Classifier
Data Mining
Big Data
Apache Spark
Multi-Objective Evolutionary Algorithm

Data inizio appello

22/07/2016

Consultabilità

Non consultabile

Data di rilascio

22/07/2086

Riassunto

Fuzzy rule-based classifiers (FRBCs) have been widely exploited in several engineering applications, such as intrusion detection, thermography-based breast cancer diagnosis, cardiac arrhythmia classification, and ground vehicles classification from acoustic features. This success is mainly due to the intrinsic nature of FRBCs, which allows them to deal with vague and noisy data and to explain how the classification task is performed.
In the last years, multi-objective evolutionary algorithms (MOEAs) have been extensively used to generate FRBCs with different trade-offs between accuracy and interpretability. Although this approach has proven to be very effective, it suffers from a drawback: the evaluation of the accuracy of each individual requires the scan of the overall dataset and in an execution of an MOEA thousands of individuals are typically generated. Thus, if the size of the dataset is very large, the use of this approach becomes very critical both from a computational and storing point of view.
Recently, however, some open source cluster computing frameworks, such as Hadoop and Spark, have been proposed and have gained popularity in short time. These frameworks allow distributing both computation and data over a computer cluster.
In this thesis, we intend to develop an MOEA for generating FRBCs from Big Data on Apache Spark. To this aim, we exploit PAES-RCS, an MOEA-based approach to learn concurrently the rule and data bases of FRBCs. In PAES-RCS, the rule bases are generated by exploiting a rule and condition selection (RCS) strategy, which selects a reduced number of rules from a heuristically generated set of candidate rules and a reduced number of conditions for each selected rule during the evolutionary process. RCS can be considered as a rule learning in a constrained search space. As regards the data base learning, the membership function parameters of each linguistic term used in the rules are learned concurrently to the application of RCS. We have implemented both the original version of PAES-RCS and three extensions. First, we manage interval type-2 fuzzy sets besides type-1 fuzzy sets. Further, we learn the granularities of the fuzzy partitions during the evolutionary process besides fixing them at the beginning. Finally, we adopt a fuzzy decision tree for generating the candidate rules besides a classical decision tree.
We have tested the different possible combinations of the extensions and the original PAES-RCS on twenty-seven small and large datasets and on six big datasets. On these datasets, we have also discussed the scalability of our implementation on Spark.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-06232016-162204