Anuket Project

Data Anonymization Tool

Contributors 

Yichen Li 

Sridhar Rao 

Lei Huang 

Requirements Collect

Anonymization RequirementsRequirements DescriptionOwnerContributor













Attributes that needs to be anonymized:

Yichen Li 

Node names

Customers

Locations:

  • "Anonymization of Location Data Does Not Work: A Large-Scale Measurement Study" (https://dl.acm.org/doi/10.1145/2030613.2030630): Indicates there are some problems in location anonymization:
    • Quotes: 

      Our study shows that sharing anonymized location data will likely lead to privacy risks and that, at a minimum, the data needs to be coarse in either the time domain (meaning the data is collected over short periods of time, in which case inferring the top N locations reliably is difficult) or the space domain (meaning the data granularity is strictly higher than the cell level). In both cases, the utility of the anonymized location data will be decreased, potentially by a significant amount.

Dates

IP, Mac

Switch names (Gateway)

Domain names


Tools Research

Yichen Li 

Data Anonymization has multiple methods, but the most popular are the following four methods (investigated Data Anonymity and Privacy documents of Google and Apple):

K-anonymity: Basically, using the idea of data generalization. Requiring for each record, there are more than k records that are equal to this record. But it contains no randomness, existing a mapping

L-diversity: There are more than L different records for one specific entry. Like name/address, etc.

T-Closeness: Sensitive information has the same distribution of the entire data set so that attackers cannot

Differential Privacy: Acquire similar results when querying two datasets with only a few differences in the number of records (like 100 vs. 99).