ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-04132019-153129


Tipo di tesi
Tesi di laurea magistrale
Autore
TEGLIA, LUCA
URN
etd-04132019-153129
Titolo
Anonymization and frequent pattern analysis of real SIEM traffic data
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
COMPUTER ENGINEERING
Relatori
relatore Prof. Dini, Gianluca
relatore Prof. Marcelloni, Francesco
relatore Dott. Cinesi, Massimo
Parole chiave
  • cm-spade
  • frequent patterns
  • sequential pattern mining
  • anonymization
  • kibana
  • elasticsearch
  • siem
  • security logs
Data inizio appello
03/05/2019
Consultabilità
Non consultabile
Data di rilascio
03/05/2089
Riassunto
This thesis aims to analyse the characteristics of security event logs in SysLog format, issued by five types of computer security systems: web servers, domain controllers, reverse proxies, a Next Generation Firewall (NGFW) and an antivirus/antispam for e-mails. The first part of the thesis describes the design and implementation of a software capable of extracting and anonymizing company security logs, while the second part focuses on the analysis carried out on the anonymized data. This analysis has aimed to identify patterns of recurrent attacks and other useful features to increase IT security.
To achieve the main goal, first the security logs were extracted from the company SIEM using a tool named Elasticdump and the ELK Stack (ElasticSearch - Logstash - Kibana). Then, an anonymizing program was implemented with the aim of processing data stored in json format using a pseudo-anonymization technique, preserving its characteristics and interdependencies. Finally, data analysis strategies have been defined and implemented by using Sequential Pattern Mining algorithms and in particular the CM-SPADE algorithm, a version of the SPADE algorithm (Sequential PAttern Discovery using Equivalence classes) which uses a Co-occurrence Map to reduce the operations of joins between sequences. Given the amount of data to be processed, this algorithm was considered the most suitable for this thesis work.
In order to define the experiments to detect recurring patterns in the data, two different approaches were chosen to construct the sequence database: in the first approach each sequence was defined by all the events included in a time window of size chosen in the configuration phase, while in the second approach each sequence was composed of all the events associated with the same IP address. The results obtained depend very much on the characteristics of the data and on the number of occurrences of the individual types of events. We note the presence of numerous patterns with multiple occurrences of the same type of event; moreover, a good number of patterns have also been obtained consisting of events of different types, but generally having a limited size.
The results obtained unequivocally indicate the tendency of the attackers to repeat the same type of attack several times, both over short periods and over longer time intervals. In some cases, attackers tend to repeat sequences of a few different types of events cyclically, probably in order to make a more articulated attack strategy.
The presence of some types of events with a much greater number of occurrences than the others prevented more interesting results from being obtained. In the future, an analysis conducted on security logs with a more uniform distribution of events and using more sophisticated sequential mining techniques will certainly bring even more significant results.
File