...
- Classic (and its variations): K-Anonymity, L-Diversity, T-Closeness, Differential Privacy
- Data Anonymization with Autoencoders
- NLP approaches for data anonymization
- Generative AI (GANs)
Anonymizing Names and Telco-Fields
We have found that the classic-techniques do well when it comes to anonymizing both Names and telco-fields (Nouns and Numbers) - when it is in a structured (columns) format.
In this repo, you can find the techniques that we have tried for these fields: https://github.com/sknrao/anonymization
Anonymizing Packet Fields
Anonymizing the packet fields is a very well researched area. Works are available from early 2000. The most recent ones are using condensation-based differential privacy.
References:
- RFC6235: IP Flow Anonymization Support. https://www.rfc-editor.org/rfc/rfc6235.txt
- PCAPLIB : Y.-D. Lin, P.-C. Lin, S.-H. Wang, I.-W. Chen, and Y.-C. Lai, "Pcaplib: A System of Extracting, Classifying, and Anonymizing Real Packet Traces," IEEE Systems Journal, vol. 10, no. 2, pp. 520-531, 2014.
- CRYPTOPAN : J. Fan, J. Xu, M. H. Ammar, and S. B. Moon, "Prefix-Preserving Ip Address Anonymization: Measurement-Based Security Evaluation and a New Cryptography-Based Scheme," Computer Networks, vol. 46, no. 2, pp. 253-272, 2004.
- Newer Version: https://ant.isi.edu/software/cryptopANT/index.html
- Using with Python: https://github.com/certtools/cryptopanlib
- TCPANON : F. Gringoli. (2009, 11/10/2020). Tcpanon. Available: http://netweb.ing.unibs.it/~ntw/tools/tcpanon/
- SCRUB-TCPDUMP: D. Koukis, S. Antonatos, D. Antoniades, E. P. Markatos, and P. Trimintzios, "A Generic Anonymization Framework for Network Traffic," in 2006 IEEE International Conference on Communications, 2006, pp. 2302-2309
- TRACEWRANGLER: J. Bongertz. (2013). Sec-4 Trace File Sanitization, the Sharkfest Challenge. Available: https://sharkfestus.wireshark.org/sharkfest.13/presentations/SEC-04_Trace-File-Sanitization-NG_Jasper-Bongertz.pdf
- PKTANON : https://github.com/KIT-Telematics/pktanon
Currently the team is working on
(a) implementing the condensation-based differential privacy.
(b) Developing containers to test and evaluate the above techniques.
Anonymizing location information (cell-ID, count, etc.).
We are currently working on this and exploring different techniques.
Anonymizing Log-Data.
The team is currently exploring use of NLP for this. Once there is a progress, we will update this section.
Phase-3
The team is currently working on building a tool that auto-detects of the PII data to picks the best technique to use on the data.
Phase-4
The team is currently building a container-based architecture for a unified tool.