ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-05262006-112952


Tipo di tesi
Tesi di dottorato di ricerca
Autore
Atzori, Maurizio
URN
etd-05262006-112952
Titolo
Abduction and Anonymity in Data Mining
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
relatore Prof. Turini, Franco
relatore Prof. Mancarella, Paolo
Parole chiave
  • PRIVACY
  • DATA MINING
  • ANONYMITY
  • ABDUCTIVE REASONING
  • ABDUCTION
Data inizio appello
26/06/2006
Consultabilità
Completa
Riassunto
This thesis investigates two new research problems that arise in modern data mining: reasoning on data mining results, and privacy implication of data mining results.
Most of the data mining algorithms rely on inductive techniques, trying to infer information that is generalized from the input data. But very often this inductive step on raw data is not enough to answer the user questions, and there is the need to process data again using other inference methods. In order to answer high level user needs such as explanation of results, we describe an environment able to perform abductive (hypothetical) reasoning, since often the solutions of such queries can be seen as the set of hypothesis that satisfy some requirements. By using cost-based abduction, we show how classification algorithms can be boosted by performing abductive reasoning over the data mining results, improving the quality of the output.
Another growing research area in data mining is the one of privacy-preserving data mining. Due to the availability of large amounts of data, easily collected and stored via computer systems, new applications are emerging, but unfortunately privacy concerns make data mining unsuitable. We study the privacy implications of data mining in a mathematical and logical context, focusing on the anonymity of people whose data are analyzed. A formal theory on anonymity preserving data mining is given, together with a number of anonymity-preserving algorithms for pattern mining.
The post-processing improvement on data mining results (w.r.t. utility and privacy) is the central focus of the problems we investigated in this thesis.
File