The idea of using a method based on Principal Components Analysis to detect anomalies in network's
traffic was first introduced by A. Lakina, M. Crovella and C. Diot in an article published in 2004 called
“Diagnosing NetworkWide Traffic Anomalies” .
They proposed a general method to diagnose traffic anomalies, using PCA to effectively separate the
highdimensional space occupied by a set of network traffic measurements into disjoint subspaces
corresponding to normal and anomalous network conditions.
This algorithm was tested in subsequent works, taking into consideration different characteristics of IP
traffic over a network (such as byte counts, packet counts, IPflow counts, etc...) .
The proposal of using entropy as a summarization tool inside the algorithm led to significant advances
in terms or possibility of analyzing massive data sources ; but this type of AD method still lacked the
possibility of recognizing the users responsible of the anomalies detected.
This last step was obtained using random aggregations of the IP flows, by means of sketches , leading
to better performances in the detection of anomalies and to the possibility of identifying the responsible
This version of the algorithm has been implemented by C. Callegari and L. Gazzarini, in Universitá di
Pisa, in an AD software, described in , for analyzing IP traffic traces and detecting anomalies in them.
Our work consisted in adapting this software (designed for working with IP traffic traces) for using it
with VoIP Call Data Records, in order to test its applicability as an Anomaly Detection system for voice
We then used our modified version of the software to scan a real VoIP traffic trace, obtained by a
telephonic operator, in order to analyze the software's performances in a real environment situation. We
used two different types of analysis on the same traffic trace, in order to understand software's features
and limits, other than its possibility of application in AD problematics.
As we discovered that the software's performances are heavily dependent on the input parameters used
in the analysis, we concluded with several tests performed using artificially created anomalies, in order
to understand the relationships between each input parameter's value and the software's capability of
detecting different types of anomalies.
The different analysis performed, in the ending, led us to some considerations upon the possibility of
applying this PCA's based software as an Anomaly Detector in VoIP environments.
At the best of our knowledge this is the first time a technique based on Principal Components Analysis
is used to detect anomalous users in VoIP traffic; in more detail our contribution consisted in:
• Creating a version of an AD software based on PCA that could be used on VoIP traffic traces
• Testing the software's performances on a real traffic trace, obtained by a telephonic operator
• From the first tests, analyzing the appropriate parameters' values that permitted us to obtain
results that could be useful for detecting anomalous users in a VoIP environment
Observing the types of users detected using the software on this trace and classify them,
according to their behavior during the whole duration of the trace
Analyzing how the parameters' choice impact the type of detections obtained from the analysis
and testing which are the best choices for detecting each type of anomalous users
Proposing a new kind of application of the software that avoids the biggest limitation of the first
type of analysis (that we will see that is the impossibility of detecting more than one anomalous
user per timebin)
Testing the software's performances with this new type of analysis, observing also how this
different type of applications impacts the results' dependence from the input parameters
Comparing the software's ability of detecting anomalous users with another type of AD
software that works on the same type of trace (VoIP SEAL)
Modifying the trace in order to obtain, from the real trace, a version cleaned from all the
detectable anomalies, in order to add in that trace artificial anomalies
Testing the software's performances in detecting different type of artificial anomalies
Analyzing in more detail the software's sensibility from the input parameters, when used for
detecting artificially created anomalies
Comparing results and observations obtained from these different types of analysis to derive a
global analysis of the characteristics of an Anomaly Detector based on Principal Components
Analysis, its values and its lacks when applying it on a VoIP trace
The structure of our work is the following:
1. We will start analyzing the PCA theory, describing the structure of the algorithm used in our
software, his features and the type of data it needs to be used as an Anomaly Detection system
for VoIP traffic.
2. Then, after shortly describing the type of trace we used to test our software, we will introduce
the first type of analysis performed, the single round analysis, pointing out the results obtained
and their dependence from the parameters' values.
3. In the following section we will focus on a different type of analysis, the multiple round
analysis, that we introduced to test the software's performances, removing its biggest limitation
(the impossibility of detecting more than one user per timebin); we will describe the results
obtained, comparing them with the ones obtained with the single round analysis, check their
dependence from the parameters and compare the performances with the ones obtained using
another type of AD software (VoIP SEAL) on the same trace.
4. We will then consider the results and observations obtained testing our software using artificial
anomalies added on a “cleaned” version of our original trace (in which we removed all the
anomalous users detectable with our software), comparing the software's performances in
detecting different types of anomalies and analyzing in detail their dependence from the
5. At last we will describe our conclusions, derived using all the observations obtained with
different types of analysis, about the applicability of a software based on PCA as an Anomaly
Detector in a VoIP environment.