Choquet integral

Heuristic Supervised Approach for Record Linkage

Publication Type:

Conference Paper

Source:

Modeling Decisions for Artificial Intelligence (MDAI), Springer Berlin / Heidelber, Volume 7647, Girona, Catalonia, p.210-221 (2012)

ISBN:

978-3-642-34619-4

URL:

http://www.springerlink.com/content/2373816u384j86m8/

Abstract:

Record linkage is a well known technique used to link records from one database to records from another database which make reference to the same individuals. Although it is usually used in database integration, it is also used in the data privacy field for the disclosure risk evaluation of protected datasets. In this paper we compare two different supervised algorithms which rely on distance-based record linkage techniques, specifically using the Choquet integral’s fuzzy integral to compute the distance between records. The first approach uses a linear optimization problem which determines the optimal fuzzy measure for the linkage. While, the second approach is a kind of gradient algorithm with constraints for the fuzzy measures’ identification. We show the advantages and drawbacks of both algorithms and also in which situations they will work better.

Choquet integral for record linkage

Publication Type:

Journal Article

Source:

Annals of Operations Research, Springer US, Volume 195, Issue 1, p.97-110 (2012)

URL:

http://www.springerlink.com/index/10.1007/s10479-011-0989-x

Abstract:

Record linkage is used in data privacy to evaluate the disclosure risk of protected data. It models potential attacks, where an intruder attempts to link records from the protected data to the original data. In this paper we introduce a novel distance based record linkage, which uses the Choquet integral to compute the distance between records. We use a fuzzy measure to weight each subset of variables from each record. This allows us to improve standard record linkage and provide insightful information about the re-identification risk of each variable and their interaction. To do that, we use a supervised learning approach which determines the optimal fuzzy measure for the linkage.

Supervised learning using mahalanobis distance for record linkage

Publication Type:

Conference Proceedings

Source:

6th International Summer School on Aggregation Operators-AGOP2011, Lulu.com, Univ. of Sannio, Benevento, Italy, p.223--228 (2011)

ISBN:

978-1-4477-7019-0

URL:

http://agop2011.ciselab.org/proceedings

Keywords:

data privacy; record linkage; disclosure risk; Mahalanobis distance; fuzzy measure; Choquet integral

Abstract:

In data privacy, record linkage is a well known technique used to evaluate the disclosure risk of protected data. Mainly, the idea is the linkage between records of different databases, which make reference to the same individuals. In this paper we introduce a new parametrized variation of record linkage relying on the Mahalanobis distance, and a supervised learning method to determine the optimum simulated covariance matrix for the linkage process. We evaluate and compare our proposal with other studied parametrized and not parametrized variations of record linkage, such as weighted mean or the Choquet integral, which determines the optimal fuzzy measure.

Supervised Learning Methods on Distance Based Record Linkage

Publication Type:

Thesis

Source:

Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain, p.25 (2010)

Keywords:

record linkage; data privacy; disclosure risk; optimization; fuzzy measure; Choquet integral

Abstract:

Record linkage is the task of identifying records corresponding to the same entity from one or more data sources. Relying on this idea, it is feasible to use it in the data privacy context, to evaluate the disclosure risk of protected data, evaluating the number of linked records between a data set and its protected version. In this project we introduce two parametrized variations of distance based record linkage. One uses a weighted mean and the other the Choquet integral to compute the distance between records. These methods, for example, allows us to improve standard record linkage and provide insightful information about the re-identification risk of each variable and also, in the second method, their interactions. To do that, we use a supervised learning approach applied to both methods which determines the optimal weights and fuzzy measure, respectively, for maximizing the linkage between two data files.

Syndicate content