Record linkage is the task of identifying records corresponding to the same entity from one or more data sources. Relying on this idea, it is feasible to use it in the data privacy context, to evaluate the disclosure risk of protected data, evaluating the number of linked records between a data set and its protected version. In this project we introduce two parametrized variations of distance based record linkage. One uses a weighted mean and the other the Choquet integral to compute the distance between records. These methods, for example, allows us to improve standard record linkage and provide insightful information about the re-identification risk of each variable and also, in the second method, their interactions. To do that, we use a supervised learning approach applied to both methods which determines the optimal weights and fuzzy measure, respectively, for maximizing the linkage between two data files.
Links:
[1] http://www.iiia.csic.es/en/individual/daniel-abril
[2] http://www.iiia.csic.es/en/individual/vicenc-torra
[3] http://www.iiia.csic.es/en/publications/keyword/record linkage
[4] http://www.iiia.csic.es/en/publications/keyword/data privacy
[5] http://www.iiia.csic.es/en/publications/keyword/disclosure risk
[6] http://www.iiia.csic.es/en/publications/keyword/optimization
[7] http://www.iiia.csic.es/en/publications/keyword/fuzzy measure
[8] http://www.iiia.csic.es/en/publications/keyword/Choquet integral
[9] http://www.iiia.csic.es/en/publications/export/tagged/4010
[10] http://www.iiia.csic.es/en/publications/export/xml/4010
[11] http://www.iiia.csic.es/en/publications/export/bib/4010
[12] http://www.iiia.csic.es/en/project/ares