Disclosure Risk

An Evolutionary Optimization Approach for Categorical Data Protection

Publication Type:

Conference Paper

Source:

Privacy and Anonymity in the Information Society 2012, Berlin (2012)

Keywords:

genetic algorithms; data privacy; categorical data; data mining; information loss; disclosure risk

Abstract:

The continuous growing amount of public sensible data has increased the risk of breaking the privacy of people or institutions in those datasets. Many protection methods have been developed to solve this problem by either distorting or generalizing data but taking into account the difficult trade-off between data utility (information loss) and protection against disclosure (disclosure risk).
In this paper we present an optimization approach for data protection based on an evolutionary algorithm which is guided by a combination of information loss and disclosure risk measures. In this way, state-of-the-art protection methods are combined to obtain new data protections with a better trade-off between these two measures. The paper presents several experimental results that assess the performance of our approach.

Supervised learning using mahalanobis distance for record linkage

Publication Type:

Conference Proceedings

Source:

6th International Summer School on Aggregation Operators-AGOP2011, Lulu.com, Univ. of Sannio, Benevento, Italy, p.223--228 (2011)

ISBN:

978-1-4477-7019-0

URL:

http://agop2011.ciselab.org/proceedings

Keywords:

data privacy; record linkage; disclosure risk; Mahalanobis distance; fuzzy measure; Choquet integral

Abstract:

In data privacy, record linkage is a well known technique used to evaluate the disclosure risk of protected data. Mainly, the idea is the linkage between records of different databases, which make reference to the same individuals. In this paper we introduce a new parametrized variation of record linkage relying on the Mahalanobis distance, and a supervised learning method to determine the optimum simulated covariance matrix for the linkage process. We evaluate and compare our proposal with other studied parametrized and not parametrized variations of record linkage, such as weighted mean or the Choquet integral, which determines the optimal fuzzy measure.

Clustering-based Information Loss for Data Protection Methods of Categorical Data

Publication Type:

Thesis

Source:

Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain, p.24 (2011)

Keywords:

Data Privacy; Information Loss; Disclosure Risk; Clustering

Abstract:

Data privacy has been always a very important issue but it became much more important with the expansion of the Internet because, nowadays, the number of public datasets avaliable for statistical studies is growing more and more, so the amount of sensitive data available on the Internet is greater every day. This fact makes very important the assessment of the performance of all the methods used to mask those datasets. In order to check the performance there exist two kind of measures: the information loss and the disclosure risk. This performance assessment comes even more important when protecting categorical data which has a very limited manipulation.
In this thesis I present an information loss analysis based on cluster-specific measures over categorical data protection methods. That is, measures specifically defined for the case in which the user will do clustering with the data. We also compare the obtained results with the ones known using general information loss analysis.

Supervised Learning Methods on Distance Based Record Linkage

Publication Type:

Thesis

Source:

Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain, p.25 (2010)

Keywords:

record linkage; data privacy; disclosure risk; optimization; fuzzy measure; Choquet integral

Abstract:

Record linkage is the task of identifying records corresponding to the same entity from one or more data sources. Relying on this idea, it is feasible to use it in the data privacy context, to evaluate the disclosure risk of protected data, evaluating the number of linked records between a data set and its protected version. In this project we introduce two parametrized variations of distance based record linkage. One uses a weighted mean and the other the Choquet integral to compute the distance between records. These methods, for example, allows us to improve standard record linkage and provide insightful information about the re-identification risk of each variable and also, in the second method, their interactions. To do that, we use a supervised learning approach applied to both methods which determines the optimal weights and fuzzy measure, respectively, for maximizing the linkage between two data files.

PRAM Optimization Using an Evolutionary Algorithm

Publication Type:

Book Chapter

Source:

Privacy in Statistical Databases, Springer, Number LNCS 6344, Corfú, Greece, p.97 - 106 (2010)

ISBN:

978-3-642-15837-7

Keywords:

Information Privacy and Security; Evolutionary Algorithms; Post Randomization Method; Information Loss; Disclosure Risk

Abstract:

PRAM (Post Randomization Method) was introduced in 1997 but it is still one of the least used methods in statistical categorical data protection. This fact is because of the difficulty to obtain a good transition matrix in order to obtain a good protection. In this paper, we describe how to obtain a better protection using an evolutionary algorithm with integrated information loss and disclosure risk measures to find the best matrix. We also provide experiments using a real dataset of 1000 records in order to empirically evaluate the application of this technique.

Syndicate content