logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-02252015-214915

Thesis type
Tesi di dottorato di ricerca
Thesis title
Data Mining for Economics
Academic discipline
Course of study
tutor Dott. Sodini, Mauro
  • Data mining
  • fraud detection
  • text mining
Graduation session start date
The first chapter, namely A Fraud Detection Algorithm for Online Banking, is the result of a research project titled "Real time methods for online banking fraud and money laundering detection" conducted within the Department of Mathematics of University of Genoa.

Research project's goal was to analyze a real world dataset of online banking transactions in order to develop an ad hoc algorithm able to detect fraudulent transactions. To reach this objective a two layers statistical classifier has been implemented using two classification algorithms: Support Vector Machines and Adaboost.

The main hurdles are represented by:

data skewness,
the low number of fraudulent operations on which build a fraud profile,
asymmetry of cost matrix which required to achieve a high true positive rate rather than a high overall performance ratio;
the need to work in real time.

The meta algorithm presented in the work proved to be able to reach set goals and, even though it has been built specifically for our problem, it achieved good performances even on different classification tasks.

The second chapter, titled A Statistical Analysis of Reliability of Audit Opinions as Bankruptcy Predictor has been produced within a research group of Department of Economics of University of Pisa.

Aim of the work is to measure if and how audit opinions (in particular going concern) issued by auditing companies could represent a reliable information on which formulate forecasts on firms financial conditions.
The research is based on a sample of US listed firms.
The analysis has been carried using classical statistical tools such as Logit regression models and more recent classification tools as support vector machines and Adaboost classifier.

Results show that the ability of auditors to forecast bankruptcy is quite poor even compared to performances achieved by statistical classifiers.

The third chapeter is titled Management Discussion & Analysis in the US financial companies: a data mining analysis and, as previous one, has been produced within a research group of Department of Economics of University of Pisa.

Aim of the work is to analyze how management reacts to firm's financial distress in the MD&A and if, this document, could be useful to forecast future financial conditions.

To do so we appeal to text mining tools which have been used to retrieve information needed to build variables used within classical regression models.

Results show that, apart from some exception, the information contained in the MD&A are coherent with firm's financial conditions and therefore can represent an additional source of information on which formulate financial forecasts.

All the algorithms and data mining tasks mentioned above have been implemented using Python whereas regression analyses have been conducted with Gretl.