Tesi etd-05252015-105805

Tipo di tesi

Tesi di laurea magistrale

Autore

FERRANTI, ANDREA

URN

etd-05252015-105805

Titolo

Multi-objective evolutionary fuzzy systems for Big Data on Apache Spark

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

COMPUTER ENGINEERING

Relatori

relatore Antonelli, Michela
relatore Prof. Marcelloni, Francesco
relatore Prof.ssa Lazzerini, Beatrice

Parole chiave

Apache Spark
Big Data
FRBS
Multi-objective evolutionary algorithms
Multi-objective optimization

Data inizio appello

19/06/2015

Consultabilità

Non consultabile

Data di rilascio

19/06/2085

Riassunto

Over the last few years, the generation of fuzzy rule-based systems (FRBSs) from data has been tackled by using a multi-objective optimization approach, with accuracy and interpretability as the objectives to be optimized. Multi-objective evolutionary algorithms (MOEA) have been so often used in this context that the FRBSs generated by exploiting these algorithms have been denoted as multi-objective evolutionary fuzzy systems (MOEFSs). In this thesis, we adopt an MOEA-based approach to learn concurrently the rule and data bases of fuzzy rule-based classifiers (FRBCs) and Mamdani fuzzy rule-based systems (MFRBSs). In particular, the rule bases are generated by exploiting a rule and condition selection (RCS) strategy, which selects a reduced number of rules from a heuristically generated set of candidate rules and a reduced number of conditions for each selected rule during the evolutionary process. As regards the data base learning, the membership function parameters of each linguistic term used in the rules are learned concurrently to the application of RCS.
One of the most critical aspects, which limits the use of MOEFSs, is the effort needed for their execution. This effort is strongly affected by the computation of the fitness, especially when the dataset is large.
To address this limitation we have exploited Apache Spark: a fast and general cluster computing system for Big Data applications. Spark extends the popular MapReduce model to efficiently support more types of computations, including iterative programs. It revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.
In the experimental part we first test our approach on twelve very large datasets, eight for classification and four for regression. Then, we compared the results obtained in classification and regression with the ones obtained by the well-know algorithm namely Decision-Tree. Moreover, in classification, we have compared our results with the ones obtained by the popular ensemble method namely Random-Forest. The results show that our approach generates FRBCs and MFRBS with accuracy comparable to, and sometimes better than, the other algorithms but with a significantly lower complexity.
Finally, we show the scalability of our approach by carrying out a number of experiments on a real-world big dataset. In particular, we evaluate the achievable speedup on a small computer cluster, highlighting the fact that the proposed approach allows handling big datasets even with modest hardware support.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-05252015-105805