Heuristic Supervised Approach for Record Linkage
Publication Type:
Conference PaperSource:
Modeling Decisions for Artificial Intelligence (MDAI), Springer Berlin / Heidelber, Volume 7647, Girona, Catalonia, p.210-221 (2012)ISBN:
978-3-642-34619-4URL:
http://www.springerlink.com/content/2373816u384j86m8/Abstract:
Record linkage is a well known technique used to link records from one database to records from another database which make reference to the same individuals. Although it is usually used in database integration, it is also used in the data privacy field for the disclosure risk evaluation of protected datasets. In this paper we compare two different supervised algorithms which rely on distance-based record linkage techniques, specifically using the Choquet integral’s fuzzy integral to compute the distance between records. The first approach uses a linear optimization problem which determines the optimal fuzzy measure for the linkage. While, the second approach is a kind of gradient algorithm with constraints for the fuzzy measures’ identification. We show the advantages and drawbacks of both algorithms and also in which situations they will work better.
Supervised learning using mahalanobis distance for record linkage
Publication Type:
Conference ProceedingsSource:
6th International Summer School on Aggregation Operators-AGOP2011, Lulu.com, Univ. of Sannio, Benevento, Italy, p.223--228 (2011)ISBN:
978-1-4477-7019-0URL:
http://agop2011.ciselab.org/proceedingsKeywords:
data privacy; record linkage; disclosure risk; Mahalanobis distance; fuzzy measure; Choquet integralAbstract:
In data privacy, record linkage is a well known technique used to evaluate the disclosure risk of protected data. Mainly, the idea is the linkage between records of different databases, which make reference to the same individuals. In this paper we introduce a new parametrized variation of record linkage relying on the Mahalanobis distance, and a supervised learning method to determine the optimum simulated covariance matrix for the linkage process. We evaluate and compare our proposal with other studied parametrized and not parametrized variations of record linkage, such as weighted mean or the Choquet integral, which determines the optimal fuzzy measure.
Non-reversible betting games on fuzzy events: Complexity and algebra
Supervised Learning Methods on Distance Based Record Linkage
Publication Type:
ThesisSource:
Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain, p.25 (2010)Keywords:
record linkage; data privacy; disclosure risk; optimization; fuzzy measure; Choquet integralAbstract:
Record linkage is the task of identifying records corresponding to the same entity from one or more data sources. Relying on this idea, it is feasible to use it in the data privacy context, to evaluate the disclosure risk of protected data, evaluating the number of linked records between a data set and its protected version. In this project we introduce two parametrized variations of distance based record linkage. One uses a weighted mean and the other the Choquet integral to compute the distance between records. These methods, for example, allows us to improve standard record linkage and provide insightful information about the re-identification risk of each variable and also, in the second method, their interactions. To do that, we use a supervised learning approach applied to both methods which determines the optimal weights and fuzzy measure, respectively, for maximizing the linkage between two data files.
