Table of Contents

Introduction

Anonymization is the process of protecting private or sensitive information by erasing or encrypting identifiers that connect an ‘entity’ to stored data. If the anonymization is fool-proof, the process of Deanonymization should not reveal personal identifiable information. Typically, anonymization could be just pattern anonymization, or just the value anonymization, or both. In this work, we want to do just the value anonymization, so as to preserve the predictive/detective power. Just like any other data, even in Telco data, we will have to deal with both the categorical Variables and the numerical variables. There are various approaches under anonymization:

...

Names (Systems, Domain, Individuals, Organizations, Places, etc.)
Address (IP and MAC)
Telco Fields - IMSI, IMEI, MSIN, MSISDN, MCC+MNC
Location Data (Cell-ID, Count, etc.). GPS Data on its own is not a sensitive information. The context around that, such as 'names', are sensitive.

PII Type	Dataset (links)
Names (Systems, Domain, Individuals, Organizations, Places, etc.)	ServerLOG1, ServerLog2, LogHub, Campus
Address (IP and MAC)	Internet Traffic Dataset: EX1, EX2
Telco Fields - IMSI, IMEI, MSIN, MSISDN, MCC+MNC	Adult Dataset enhanced with Telco-Fields Adult Dataset: Generate random IMEI/IMSI* fields and add it to this dataset
Location Data (GPS, Cell-ID, Count, etc.)	OpenCellID, GPS, IEEE-dataport (crawdad)

Phase-2

In this phase we would want to:

...

Classic (and its variations): K-Anonymity, L-Diversity, T-Closeness, Differential Privacy
Data Anonymization with Autoencoders
NLP approaches for data anonymization
Generative AI (GANs)

Version	Old Version 2	New Version 3
Changes made by	Sridhar Rao	Sridhar Rao
Saved on	Mar 24, 2024	Mar 24, 2024

Versions Compared

Key

Introduction

Phase-2

Phase-3

Phase-4

Page Comparison

Versions Compared

Key

Introduction

Phase-2

Phase-3

Phase-4