ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-05282018-153312


Tipo di tesi
Tesi di laurea magistrale
Autore
GAGLIANO, GIUSEPPE ANTONIO CRISTIAN
URN
etd-05282018-153312
Titolo
Distributed Outlier Detection Over Big Data Streams
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
COMPUTER ENGINEERING
Relatori
relatore Prof. Marcelloni, Francesco
Parole chiave
  • outlier detection
  • machine learning
  • distributed system
  • big data
  • spark
Data inizio appello
22/06/2018
Consultabilità
Non consultabile
Data di rilascio
22/06/2088
Riassunto

The fast growing of data observed in recent years does not seem to slow down. An increasing interest in the field of knowledge discovery and data mining is to extract information from Big Data Streams. Several big companies have developed and invested in platforms and methods to make predictions on data streams.

Often the analysis on these data is focused on detecting significant deviations from standard behaviour through outlier detection techniques. In fact, outlier detection has lots of applications, from monitoring systems, to data preprocessing for applying other
machine learning techniques.

The objective of this thesis was to detect anomalies from data streams using the most recent methodologies in the field of incremental outlier detection. After a study on methods and tools, density based outlier detectors were developed by using the Spark framework. Experimental results are also shown and discussed.
File