Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Detailed workflow

WeekTaskStatusComments
20-MayStudy Work: State of art on the models, optimization and EvaluationDoneLook for optimization techniques, how they evaluate anonymization models.
27-MayFinalizing Dataset and Libraries to use -- suppression/rename/ .. etc.DoneKubernetes logs/Metrics, Openstack logs/metrics .. any data that has PII information
3-June

Anonymization Impact on the Model's utility

Done
10-JuneDone
17-JuneContaineration and the APIsDone
24-JuneAutomation using PythonDone
1-JulyTesting of the containerized ArchitectureDone
8-July

NLP Model for anonymizing Telco Data



15-July

22-July

29-July

5-AugEvaluation of the Model

12-AugIntegration of the developed model with the architecture

19-AugDocumentation and release of the code.

26-Aug[BUFFER]

...

  1. Precision and Recall: These metrics are commonly used to assess the performance of NLP models in text anonymization. Precision measures the proportion of correctly anonymized information among all the information that the model labeled as sensitive, while recall measures the proportion of correctly anonymized information among all the sensitive information present in the text. 
  2. F1 Score: The F1 score provides a balanced evaluation of the model's performance in anonymizing text data. It considers both false positives and false negatives, offering an assessment of the model's effectiveness. 
  3. But we need to have the ground truth for testing the validity of the models using the above methods.
  4. To test the decrease in the utility of the text, one way is to train a model before anonymization and to train again after anonymization to check the difference in the performance. Lesser the difference, better the anonymization process.
  5. Human Evaluations: Human evaluations involve experts assessing the anonymized documents for re-identification risks and data utility preservation.

Reference Research papers:

  1. https://aclanthology.org/2021.acl-long.323.pdf (Showcases the problems and the evaluation methodology for anonymization models)
  2. https://www.researchgate.net/publication/347730431_Anonymization_Techniques_for_Privacy_Preserving_Data_Publishing_A_Comprehensive_Survey (A survey for different types of techniques)

...