Tesi etd-07082015-180313 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
DEL PIZZO, BENIAMINO
URN
etd-07082015-180313
Titolo
An evolutionary approach on Apache Spark to learn TSK-fuzzy systems from big data
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA INFORMATICA
Relatori
relatore Dott. Bechini, Alessio
relatore Prof. Marcelloni, Francesco
relatore Prof. Marcelloni, Francesco
Parole chiave
- big data
- clustering
- fuzzy
- genetic algorithm
- spark
- tsk
Data inizio appello
24/07/2015
Consultabilità
Non consultabile
Data di rilascio
24/07/2085
Riassunto
As a result of the computerization of our society and the growth of data volume, we observed the birth of data mining, a new science that transforms a large collection of data into knowledge.
In this thesis we adopt a fuzzy logic-based approach which represents a regression analysis model for large data sets such as big data.
Big data is an evolving term that describes any voluminous amount of struc- tured, semi-structured and unstructured data that has the potential to be mined for information.
Regression analysis is a statistical methodology that is very often used for numeric prediction, although other methods exist as well. Regression also encompasses the identification of distribution trends, based on the available data.
In our approach, the relation between the data is modeled by a set of fuzzy rules , of the Takagi-Sugeno-Kang (TSK) type, extracted automatically from the data itself through a two-step procedure. Initially, a compact rule base is generated by projecting onto the input variables the clusters produced by a fuzzy clustering algorithm, then, a genetic algorithm is applied to optimize the rules. Appropriate constraints maintain the semantic properties of the initial model during the genetic evolution.
In order to work on big data, the Apache Spark framework has been used. Spark is a fast and general engine for large-scale data processing, it is built in Scala, a general-purpose programming language in which object- oriented meets functional programming paradigm. Spark is going to replace the classic Hadoop MapReduce, thanks to its better performance, especially over iterative problems.
To implement the genetic algorithm we took advantage of the jMetal frame- work.
In this thesis we adopt a fuzzy logic-based approach which represents a regression analysis model for large data sets such as big data.
Big data is an evolving term that describes any voluminous amount of struc- tured, semi-structured and unstructured data that has the potential to be mined for information.
Regression analysis is a statistical methodology that is very often used for numeric prediction, although other methods exist as well. Regression also encompasses the identification of distribution trends, based on the available data.
In our approach, the relation between the data is modeled by a set of fuzzy rules , of the Takagi-Sugeno-Kang (TSK) type, extracted automatically from the data itself through a two-step procedure. Initially, a compact rule base is generated by projecting onto the input variables the clusters produced by a fuzzy clustering algorithm, then, a genetic algorithm is applied to optimize the rules. Appropriate constraints maintain the semantic properties of the initial model during the genetic evolution.
In order to work on big data, the Apache Spark framework has been used. Spark is a fast and general engine for large-scale data processing, it is built in Scala, a general-purpose programming language in which object- oriented meets functional programming paradigm. Spark is going to replace the classic Hadoop MapReduce, thanks to its better performance, especially over iterative problems.
To implement the genetic algorithm we took advantage of the jMetal frame- work.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |