Tesi etd-06262015-102832 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
RICCI, SARA
URN
etd-06262015-102832
Titolo
Disclosure Risk Assessment via Record Linkage by a Maximum-Knowledge Attacker
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Prof. Domingo-Ferrer, Josep
correlatore Dott. Caboara, Massimo
correlatore Dott. Soria-Comas, Jordi
correlatore Dott. Caboara, Massimo
correlatore Dott. Soria-Comas, Jordi
Parole chiave
- Data anonymization
- intruder model
- permutation paradigm
- record linkage
- statistical disclosure control
Data inizio appello
17/07/2015
Consultabilità
Non consultabile
Data di rilascio
17/07/2085
Riassunto
Before releasing an anonymized data set, the data protector must know how safe the data set is, that is, how much disclosure risk is incurred by the release.
If no privacy model is used to select specific privacy guarantees prior to anonymization, posterior disclosure risk assessment must be performed based on the anonymized data set and, if the result is not satisfactory, anonymization must be repeated with stricter privacy parameters. Even if a privacy model is used, it may still be advisable to empirically evaluate disclosure on the anonymized data set, especially if the privacy model parameters have been relaxed to improve data utility.
Record linkage is a general methodology to posterior disclosure risk assessment, whereby the data protector attempts to recreate the attacker's re-identification scenario.
An important limitation of record linkage is that it usually requires the data protector to make restrictive assumptions on the attacker's background knowledge.
To overcome this limitation, we present a maximum-knowledge attacker model and then we specify and compare several record linkage tests for such a worst-case attacker.
Our tests are based on comparing the distribution of linkage distances between the original and the anonymized data set with the distribution of distances between one of the two previous data sets and one random data set.
The more similar the distributions, the more plausibly deniable are record linkages claimed by an attacker. Because attaining zero disclosure risk for all records is too costly in terms of utility, a less demanding alternative is presented whose goal is to reduce the maximum per-record disclosure risk.
If no privacy model is used to select specific privacy guarantees prior to anonymization, posterior disclosure risk assessment must be performed based on the anonymized data set and, if the result is not satisfactory, anonymization must be repeated with stricter privacy parameters. Even if a privacy model is used, it may still be advisable to empirically evaluate disclosure on the anonymized data set, especially if the privacy model parameters have been relaxed to improve data utility.
Record linkage is a general methodology to posterior disclosure risk assessment, whereby the data protector attempts to recreate the attacker's re-identification scenario.
An important limitation of record linkage is that it usually requires the data protector to make restrictive assumptions on the attacker's background knowledge.
To overcome this limitation, we present a maximum-knowledge attacker model and then we specify and compare several record linkage tests for such a worst-case attacker.
Our tests are based on comparing the distribution of linkage distances between the original and the anonymized data set with the distribution of distances between one of the two previous data sets and one random data set.
The more similar the distributions, the more plausibly deniable are record linkages claimed by an attacker. Because attaining zero disclosure risk for all records is too costly in terms of utility, a less demanding alternative is presented whose goal is to reduce the maximum per-record disclosure risk.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |