Clustering-based Information Loss for Data Protection Methods of Categorical Data
Tipo de Publicación:Thesis
Origen:Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain, p.24 (2011)
Palabras clave:Data Privacy; Information Loss; Disclosure Risk; Clustering
Data privacy has been always a very important issue but it became much more important with the expansion of the Internet because, nowadays, the number of public datasets avaliable for statistical studies is growing more and more, so the amount of sensitive data available on the Internet is greater every day. This fact makes very important the assessment of the performance of all the methods used to mask those datasets. In order to check the performance there exist two kind of measures: the information loss and the disclosure risk. This performance assessment comes even more important when protecting categorical data which has a very limited manipulation.
In this thesis I present an information loss analysis based on cluster-specific measures over categorical data protection methods. That is, measures specifically defined for the case in which the user will do clustering with the data. We also compare the obtained results with the ones known using general information loss analysis.