logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-06022011-235149

Thesis type
Tesi di dottorato di ricerca
Thesis title
Privacy by Design in Data Mining
Academic discipline
Course of study
tutor Pedreschi, Dino
tutor Giannotti, Fosca
  • Data Mining
  • Data Protection
  • Privacy
Graduation session start date
Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, it is in- creasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze complex data which describes human activities in great detail and resolution. As a result anonymization simply cannot be accomplished by de-identification. In the last few years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim to find an acceptable trade-off between data privacy on the one hand and data utility on the other. So far, the common result obtained is that no general method exists which is capable of both dealing with “generic personal data” and preserving “generic analytical results”.
In this thesis we propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technol- ogy by design, so that the analysis incorporates the relevant privacy requirements from the start. Therefore, we propose the privacy-by-design paradigm that sheds a new light on the study of privacy protection: once specific assumptions are made about the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones.
This thesis investigates on two new research issues which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing.