Sign In

Publishing Microdata with Privacy Preservation through Mutual Cover

Core Concepts
MuCo preserves more information utility than generalization while achieving great protection performance against identity disclosure and attribute disclosure.
The article proposes a novel technique called Mutual Cover (MuCo) to anonymize microdata for privacy preservation. Key highlights: MuCo partitions the microdata into groups and assigns similar records into the same group. It calculates random output tables to make similar tuples cover for each other by randomizing their quasi-identifier (QI) values. MuCo generates the anonymized microdata by replacing the original QI values with random values according to the random output tables. MuCo satisfies a proposed ๐›ฟ-probability principle to limit the probability of re-identifying a target person by matching a QI value. Compared to generalization techniques like Mondrian, MuCo can maintain the distributions of original QI values better and provide more accurate query answering, while achieving similar levels of privacy protection. The anonymization process of MuCo is hidden from the adversary, making it harder for them to determine which QI values are altered.
The original microdata table has 40,152 tuples with 8 attributes, including 7 quasi-identifiers (gender, age, relationship, marital status, race, education, hours per week) and 1 sensitive attribute (salary).
"MuCo can prevent both identity disclosure and attribute disclosure while retaining the information utility more effectively than generalization." "The anonymization process of MuCo is hidden for the adversary." "MuCo provides impressive privacy protection, little information loss, and accurate query answering."

Key Insights Distilled From

by Boyu Li,Jian... at 04-01-2024

Deeper Inquiries

How can MuCo be extended to handle high-dimensional microdata with more quasi-identifiers

To extend MuCo to handle high-dimensional microdata with more quasi-identifiers, several strategies can be implemented. One approach is to optimize the partitioning of tuples into groups by considering the relationships between a larger number of quasi-identifiers. This can involve developing more sophisticated algorithms that can efficiently group similar records together based on multiple quasi-identifiers. Additionally, the calculation of random output tables can be enhanced to accommodate the increased dimensionality of the data. This may involve refining the distance functions used to measure similarity between records and adjusting the probabilities in the random output tables accordingly. Overall, the extension of MuCo to high-dimensional microdata would require a more complex and scalable implementation to handle the additional quasi-identifiers effectively.

What are the potential limitations of the ๐›ฟ-probability principle, and how can it be further improved

The ๐›ฟ-probability principle in MuCo, while effective in controlling the probability of re-identifying individuals based on quasi-identifiers, may have some limitations. One potential limitation is the sensitivity of the ๐›ฟ value, as setting it too low may result in increased information loss, while setting it too high may reduce the effectiveness of privacy protection. To address this limitation, a dynamic ๐›ฟ value that adapts to the specific characteristics of the data could be explored. This adaptive approach could adjust the ๐›ฟ value based on the distribution of quasi-identifiers in the microdata, ensuring a balance between privacy protection and information utility. Additionally, incorporating feedback mechanisms to fine-tune the ๐›ฟ value based on the performance of the anonymization process could further enhance the robustness of the ๐›ฟ-probability principle in MuCo.

How can the ideas behind MuCo be applied to other data publishing scenarios beyond microdata, such as graph data or time series data

The concepts and principles behind MuCo can be applied to other data publishing scenarios beyond microdata, such as graph data or time series data, by adapting the anonymization techniques to suit the specific characteristics of these data types. For graph data, the idea of grouping similar nodes or edges together and perturbing their attributes to prevent re-identification can be employed. This could involve developing algorithms that consider the connectivity and structure of the graph to ensure effective privacy protection. In the case of time series data, the notion of randomizing values based on time intervals or patterns could be utilized to obfuscate sensitive information while maintaining data utility. By customizing the MuCo framework to these different data structures, it is possible to achieve privacy preservation in diverse data publishing scenarios.