Riassunto analitico
The idea of using a method based on Principal Components Analysis to detect anomalies in network's traffic was first introduced by A. Lakina, M. Crovella and C. Diot in an article published in 2004 called “Diagnosing NetworkWide Traffic Anomalies” [1]. They proposed a general method to diagnose traffic anomalies, using PCA to effectively separate the highdimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. This algorithm was tested in subsequent works, taking into consideration different characteristics of IP traffic over a network (such as byte counts, packet counts, IPflow counts, etc...) [2]. The proposal of using entropy as a summarization tool inside the algorithm led to significant advances in terms or possibility of analyzing massive data sources [3]; but this type of AD method still lacked the possibility of recognizing the users responsible of the anomalies detected. This last step was obtained using random aggregations of the IP flows, by means of sketches [4], leading to better performances in the detection of anomalies and to the possibility of identifying the responsible IP flows. This version of the algorithm has been implemented by C. Callegari and L. Gazzarini, in Universitá di Pisa, in an AD software, described in [5], for analyzing IP traffic traces and detecting anomalies in them. Our work consisted in adapting this software (designed for working with IP traffic traces) for using it with VoIP Call Data Records, in order to test its applicability as an Anomaly Detection system for voice traffic. We then used our modified version of the software to scan a real VoIP traffic trace, obtained by a telephonic operator, in order to analyze the software's performances in a real environment situation. We used two different types of analysis on the same traffic trace, in order to understand software's features and limits, other than its possibility of application in AD problematics. As we discovered that the software's performances are heavily dependent on the input parameters used in the analysis, we concluded with several tests performed using artificially created anomalies, in order to understand the relationships between each input parameter's value and the software's capability of detecting different types of anomalies. The different analysis performed, in the ending, led us to some considerations upon the possibility of applying this PCA's based software as an Anomaly Detector in VoIP environments. At the best of our knowledge this is the first time a technique based on Principal Components Analysis is used to detect anomalous users in VoIP traffic; in more detail our contribution consisted in: • Creating a version of an AD software based on PCA that could be used on VoIP traffic traces • Testing the software's performances on a real traffic trace, obtained by a telephonic operator • From the first tests, analyzing the appropriate parameters' values that permitted us to obtain results that could be useful for detecting anomalous users in a VoIP environment Observing the types of users detected using the software on this trace and classify them, according to their behavior during the whole duration of the trace Analyzing how the parameters' choice impact the type of detections obtained from the analysis and testing which are the best choices for detecting each type of anomalous users Proposing a new kind of application of the software that avoids the biggest limitation of the first type of analysis (that we will see that is the impossibility of detecting more than one anomalous user per timebin) Testing the software's performances with this new type of analysis, observing also how this different type of applications impacts the results' dependence from the input parameters Comparing the software's ability of detecting anomalous users with another type of AD software that works on the same type of trace (VoIP SEAL) Modifying the trace in order to obtain, from the real trace, a version cleaned from all the detectable anomalies, in order to add in that trace artificial anomalies Testing the software's performances in detecting different type of artificial anomalies Analyzing in more detail the software's sensibility from the input parameters, when used for detecting artificially created anomalies Comparing results and observations obtained from these different types of analysis to derive a global analysis of the characteristics of an Anomaly Detector based on Principal Components Analysis, its values and its lacks when applying it on a VoIP trace The structure of our work is the following: 1. We will start analyzing the PCA theory, describing the structure of the algorithm used in our software, his features and the type of data it needs to be used as an Anomaly Detection system for VoIP traffic. 2. Then, after shortly describing the type of trace we used to test our software, we will introduce the first type of analysis performed, the single round analysis, pointing out the results obtained and their dependence from the parameters' values. 3. In the following section we will focus on a different type of analysis, the multiple round analysis, that we introduced to test the software's performances, removing its biggest limitation (the impossibility of detecting more than one user per timebin); we will describe the results obtained, comparing them with the ones obtained with the single round analysis, check their dependence from the parameters and compare the performances with the ones obtained using another type of AD software (VoIP SEAL) on the same trace. 4. We will then consider the results and observations obtained testing our software using artificial anomalies added on a “cleaned” version of our original trace (in which we removed all the anomalous users detectable with our software), comparing the software's performances in detecting different types of anomalies and analyzing in detail their dependence from the parameters' values. 5. At last we will describe our conclusions, derived using all the observations obtained with different types of analysis, about the applicability of a software based on PCA as an Anomaly Detector in a VoIP environment.
|