Towards a private vector space model for confidential documents
Publication Type:
Conference PaperSource:
Symposium On Applied Computing, ACM, Coimbra, Portugal, p.944--945 (2013)ISBN:
978-1-4503-1656-9URL:
http://doi.acm.org/10.1145/2480362.2480543Keywords:
anonymization; document vector space; indexes; privacyAbstract:
We introduce in this paper a method to anonymize document vector spaces. These vector spaces can be used to analyze confidential documents without disclosing private information. The method is inspired in microaggregation, a popular technique used in statistical disclosure control.
Privacy in Data Mining
Semantic Microaggregation for the Anonymization of Query Logs
Using Classification Methods to Evaluate Attribute Disclosure Risks
Distributed Privacy-Preserving Methods for Statistical Disclosure Control
Classifying data from protected statistical datasets
Information loss for synthetic data through fuzzy clustering
Privacy-preserving data-mining through micro-aggregation for web-based e-commerce
Publication Type:
Journal ArticleSource:
Internet Research, Emerald Group Publishing Limited, Volume 20, Issue 3, p.366-384 (2010)URL:
http://dx.doi.org/10.1108/10662241011050759Towards Semantic Microaggregation of Categorical Data for Confidential Documents
Publication Type:
Conference ProceedingsSource:
Modeling Decisions for Artificial Intelligence, MDAI 2010, Springer, Volume 6408, Perpignan, France, p.266-276 (2010)URL:
http://www.springerlink.com/content/f41402862155w6t4/Keywords:
Web indexing task; classification task; frequency term vector; k-anonymity preservation; privacy preserving information retrieval; semantic microaggregation; Internet; data privacy; information retrieval; pattern classification; vectorsAbstract:
In the data privacy context, specifically, in statistical disclosure control techniques, microaggregation is a well-known microdata protection method, ensuring the confidentiality of each individual. In this paper, we propose a new approach of microaggregation to deal with semantic sets of categorical data, like text documents. This method relies on the WordNet framework that provides complete semantic relationship taxonomy between words. Therefore, this extension aims ensure the confidentiality of text documents, but at the same time, it should preserve the general meaning. We apply some measures to evaluate the quality of the protection method relying on information loss.
