privacy preserving information retrieval

On the declassification of confidential documents

Publication Type:

Conference Proceedings

Source:

Modeling Decisions for Artificial Intelligence, MDAI 2011, Springer, Volume 6820, Changsha, China, p.235-246 (2011)

URL:

http://www.springerlink.com/content/tg81j807q42x8837/

Keywords:

declassification; anonymity; privacy preserving information retrieval; semantic; data privacy; information retrieval; pattern classification; named-entity recognition

Abstract:

Abstract. We introduce the anonymization of unstructured documents to settle the base of automatic declassification of confidential documents. Departing from known ideas and methods of data privacy, we introduce the main issues of unstructured document anonymization and propose the use of named entity recognition techniques from natural language processing and information extraction to identify the entities of the document that need to be protected.

Towards privacy preserving information retrieval through semantic microaggregation

Publication Type:

Conference Proceedings

Source:

Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, IEEE, Volume 3, Toronto, Canada, p.296 - 299 (2010)

ISBN:

978-1-4244-8482-9

URL:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5614556

Keywords:

Web indexing task; classification task; frequency term vector; k-anonymity preservation; privacy preserving information retrieval; semantic microaggregation; Internet; data privacy; information retrieval; pattern classification; vectors

Abstract:

In this paper we introduce the problem of providing privacy preserving information for Web indexing, classification, and other information retrieval task. Web pages are represented by a frequency term vector that preserves k-anonymity for all the Web pages. This vector can then be used, for example, to build indexes of classifiers. Our proposal makes use of semantic micro aggregation.

Towards Semantic Microaggregation of Categorical Data for Confidential Documents

Publication Type:

Conference Proceedings

Source:

Modeling Decisions for Artificial Intelligence, MDAI 2010, Springer, Volume 6408, Perpignan, France, p.266-276 (2010)

URL:

http://www.springerlink.com/content/f41402862155w6t4/

Keywords:

Web indexing task; classification task; frequency term vector; k-anonymity preservation; privacy preserving information retrieval; semantic microaggregation; Internet; data privacy; information retrieval; pattern classification; vectors

Abstract:

In the data privacy context, specifically, in statistical disclosure control techniques, microaggregation is a well-known microdata protection method, ensuring the confidentiality of each individual. In this paper, we propose a new approach of microaggregation to deal with semantic sets of categorical data, like text documents. This method relies on the WordNet framework that provides complete semantic relationship taxonomy between words. Therefore, this extension aims ensure the confidentiality of text documents, but at the same time, it should preserve the general meaning. We apply some measures to evaluate the quality of the protection method relying on information loss.

Syndicate content