Tesi etd-07082015-180313

Tipo di tesi

Tesi di laurea magistrale

Autore

DEL PIZZO, BENIAMINO

URN

etd-07082015-180313

Titolo

An evolutionary approach on Apache Spark to learn TSK-fuzzy systems from big data

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

INGEGNERIA INFORMATICA

Relatori

relatore Dott. Bechini, Alessio
relatore Prof. Marcelloni, Francesco

Parole chiave

spark
genetic algorithm
fuzzy
clustering
big data
tsk

Data inizio appello

24/07/2015

Consultabilità

Non consultabile

Data di rilascio

24/07/2085

Riassunto

As a result of the computerization of our society and the growth of data volume, we observed the birth of data mining, a new science that transforms a large collection of data into knowledge.
In this thesis we adopt a fuzzy logic-based approach which represents a regression analysis model for large data sets such as big data.
Big data is an evolving term that describes any voluminous amount of struc- tured, semi-structured and unstructured data that has the potential to be mined for information.
Regression analysis is a statistical methodology that is very often used for numeric prediction, although other methods exist as well. Regression also encompasses the identification of distribution trends, based on the available data.
In our approach, the relation between the data is modeled by a set of fuzzy rules , of the Takagi-Sugeno-Kang (TSK) type, extracted automatically from the data itself through a two-step procedure. Initially, a compact rule base is generated by projecting onto the input variables the clusters produced by a fuzzy clustering algorithm, then, a genetic algorithm is applied to optimize the rules. Appropriate constraints maintain the semantic properties of the initial model during the genetic evolution.
In order to work on big data, the Apache Spark framework has been used. Spark is a fast and general engine for large-scale data processing, it is built in Scala, a general-purpose programming language in which object- oriented meets functional programming paradigm. Spark is going to replace the classic Hadoop MapReduce, thanks to its better performance, especially over iterative problems.
To implement the genetic algorithm we took advantage of the jMetal frame- work.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-07082015-180313