heuristics.

Heuristic Supervised Approach for Record Linkage

Publication Type:

Conference Paper

Source:

Modeling Decisions for Artificial Intelligence (MDAI), Springer Berlin / Heidelber, Volume 7647, Girona, Catalonia, p.210-221 (2012)

ISBN:

978-3-642-34619-4

URL:

http://www.springerlink.com/content/2373816u384j86m8/

Abstract:

Record linkage is a well known technique used to link records from one database to records from another database which make reference to the same individuals. Although it is usually used in database integration, it is also used in the data privacy field for the disclosure risk evaluation of protected datasets. In this paper we compare two different supervised algorithms which rely on distance-based record linkage techniques, specifically using the Choquet integral’s fuzzy integral to compute the distance between records. The first approach uses a linear optimization problem which determines the optimal fuzzy measure for the linkage. While, the second approach is a kind of gradient algorithm with constraints for the fuzzy measures’ identification. We show the advantages and drawbacks of both algorithms and also in which situations they will work better.

Data privacy for simply anonymized network logs represented as graphs - considerations for graph alteration operations

Publication Type:

Journal Article

Source:

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (2011)

Keywords:

data privacy; graphs; operators; heuristics.

Abstract:

In this paper we review the state of the art on graph privacy with special emphasis on applications to online social networks, and we consider some novel aspects which have not been greatly covered in the specialized literature on graph privacy. The following key considerations are covered: (i) choice of different operators to modify the graph; (ii) information loss based on the cost of graph operations in terms of statistical characteristics (degree, clustering coefficient and path length) in the original graph; (iii) computational cost of the operations; (iv) in the case of the aggregation of two nodes, the choice of similar adjacent nodes rather than isomorphic topologies, in order to maintain the overall structure of the graph; (v) a statistically knowledgeable attacker who is able to search for regions of a simply anonymized graph based on statistical characteristics and map those onto a given node and its immediate neighborhood.

Syndicate content