Anuket Project
Data Anonymization Tool
Contributors
Requirements Collect
Anonymization Requirements | Requirements Description | Owner | Contributor |
---|---|---|---|
Attributes that needs to be anonymized:
Node names
Customers
Locations:
- "Anonymization of Location Data Does Not Work: A Large-Scale Measurement Study" (https://dl.acm.org/doi/10.1145/2030613.2030630): Indicates there are some problems in location anonymization:
- Quotes:
Our study shows that sharing anonymized location data will likely lead to privacy risks and that, at a minimum, the data needs to be coarse in either the time domain (meaning the data is collected over short periods of time, in which case inferring the top N locations reliably is difficult) or the space domain (meaning the data granularity is strictly higher than the cell level). In both cases, the utility of the anonymized location data will be decreased, potentially by a significant amount.
- Quotes:
Dates
IP, Mac
- Google Document (https://support.google.com/analytics/answer/2763052?hl=en)
- Basically Generalization (k-anonymity) & Add noise/randomness (Differential Privacy)
Switch names (Gateway)
Domain names
Tools Research
Data Anonymization has multiple methods, but the most popular are the following four methods (investigated Data Anonymity and Privacy documents of Google and Apple):
K-anonymity: Basically, using the idea of data generalization. Requiring for each record, there are more than k records that are equal to this record. But it contains no randomness, existing a mapping
L-diversity: There are more than L different records for one specific entry. Like name/address, etc.
T-Closeness: Sensitive information has the same distribution of the entire data set so that attackers cannot
Differential Privacy: Acquire similar results when querying two datasets with only a few differences in the number of records (like 100 vs. 99).
- Resources:
- K-anonymity: Official package (anonypy): https://pypi.org/project/anonypy/, including all three methods: K-anonymity, L-diversity, and T-Closeness.
- Fundamental implementation of K-anonymity: https://github.com/kaylode/k-anonymity
- Differential Privacy: Popular package (PyDP) implemented by OpenMined: https://github.com/OpenMined/PyDP; Official portal (PyDP): https://pypi.org/project/python-dp/
- Some other anonymity tools: Faker: https://faker.readthedocs.io/en/master/; pynonymizer: https://pypi.org/project/pynonymizer/