Public comments invited by MEITY on draft Guidelines for Anonymisation of Data by 21st September, 2022

The Ministry of Electronics & Information Technology (“MeitY”) has released draft Guidelines for Anonymisation of Data (“Draft Guidelines”) for public consultation. Comments and feedback are required be sent by 21st September, 2022, addressed to Mr. Shubhanshu Gupta, Principal Technical Officer, CDAC Pune & Member Convener of the WG at the email-id: shubhanshug@cdac.in with a copy to headits@stqc.gov.in. These guidelines have been issued with a view to help enhance privacy protection through case-dependent data anonymisation while processing, publishing, storing or sharing data with other entities.
What is Data Anonymisation?
Data Anonymisation is a processing technique that removes or modifies direct and indirect personally identifiable attributes to eliminate or significantly reduce identifiability. It typically results in “anonymised data sets” that cannot be associated with an individual.
The draft Personal Data Protection Bill (PDPB) and the JPC draft Data Protection Act 2021 also defines Anonymization concerning personal data as the “irreversible process of transforming or converting personal data to a form in which a data principal cannot be identified”, meeting the standards of irreversibility specified by the Authority (proposed Data Protection Authority). Data which has undergone the process of anonymization is referred to as anonymized data. Reducing the risk of identifying individuals to a sufficiently remote level may be deemed as anonymization. When effectively done, anonymisation may help protect the privacy rights of “data principal” balanced against reasonable purpose/ legitimate interests and acts as a layer of defence to reduce data abuse after unsanctioned access.
Processing Purposes:
Organisational data processing may be categorised as:
1. Purpose based processing, which requires the organisation to clearly define all such purposes and get clear, affirmative, and explicit consent from the data principals for processing purpose;
2. Processing to fulfil a lawful disclosure request;
3. Sharing data with data-processors/third parties and other entities for processing purposes;
4. Processing to integrate products and services with other data-tech ecosystems for the benefit of consumers;
5. Any additional processing that the organisation carries out to improve services, cross-sell, collaborate or maintain the competitive edge. In some cases where processing is experimental or short-lived; some organisations do not typically declare it as a formal purpose and collect consent against it.
However, the Draft Guidelines do not delve into processing purposes but focus on identifying appropriate data anonymisation approaches to be in line with e-governance project’s processing principles.
Process of Data Anonymisation:
Step 1 – Identify PII and Sensitive Data: Identification of Personally Identifiable Information (PII), Sensitive personal data and critical data in the department or project or software application. [Example – Biometrics, Health records, financial records, authenticated services, addresses, unique identifiers etc.]
Step 2 – Determining data sources: This step is about identifying the data sources where PII and Sensitive data is potentially stored, used or referred.
Step 3 – Data discovery: Identification of the fields where the data is stored or used, identification of PII fields in the application. The Discovery of PII Data can be manual or automated, while an automated approach is preferred over manual, to improve the accuracy of discovery.
Step 4 – Determining Anonymisation Technique: This step includes determining the technique of data anonymisation like data pseudonymisation, data redaction, defining Data redaction rules based on the roles. [For instance, the last few digits of the PAN information would be made available to high privilege users.]
Step 5 – Anonymize Data: This step includes anonymisation of the data identified.
The Draft Guideline also provides for detailed techniques for data anonymisation along with their advantages, limitations and suitability for various datasets/ scenarios.
Criteria for Selection of Anonymization techniques:
Organizations can select the anonymization techniques as per the applicability according to the various factors such as business requirements, security considerations and regulatory requirements. They might even choose a hybrid model of applying more than one anonymization technique on the data to achieve the objective of eliminating the identifier.
Stakeholders, Governance, Audit & Feedback Mechanism:
All those who capture, process, store, share or use the data are stakeholders of data anonymisation are the stakeholders under the Draft Guidelines. The list of these stakeholders is as below:
1. Professional Users (Users of the data captured/ processed by any e-governance organisation)-
Many applications need data for analysis and research purposes. Such data is captured by an owner application and then shared with these professional *users. An application that captures the data is referred to as an “Owner Application”. The owner application should wherever feasible share only anonymised data with user applications.
[*The users can be of different types:
● Application Users – Departments or applications using or integrated with the source application have to access the data generated by the source application. In this scenario, data sharing is happening between two systems.
● Citizens – The users of the owner application also access the data from the owner application online (onscreen information) or offline (such as in print) format.
● Call centres and Third-party Service providers – Automated verification of the people’s information even without third-party service providers having access to sensitive data of the source application.
● Researchers or data analysers – The users who need the data for analysis and research purposes and expect that the data provided to them is de-identified.]
2. Processors:
The owner application captures Personal Identifiable Information (PII) or sensitive information. The application then processes the information to achieve different objectives. Teams involved in processing the data are referred to as “processors”. These teams are responsible for converting raw personal data into anonymised form by various technical procedures/ tools. The processing organisation includes partners/ contractors, various teams associated with the Source Application such as the development team, testing team, Production support, system admin, infrastructure, etc., who have direct access to the captured data.
3. Auditors and Reviewers:
As AoD is crucial to preserve an individual’s privacy and make data more secure, it needs to be audited and reviewed. Auditors are responsible for assessing processed data to identify whether an individual is identifiable by combining de-identified or anonymised data with other data using both direct and indirect methods. The auditors and reviewers can be Compliance Officer, Legal staff, External Auditors/ Reviewers. They are important stakeholders as they also have access to the data during the audit process
4. Data Principals:
Data Principal are users whose personal data is processed in various e-governance projects for providing services.
SOP for anonymising data:
The 15-step SOP for organisations undertaking data anonymisation suggests:
• Step 1: Determine which datasets required anonymisation. Consider data collected from all possible sources.
• Step 2: Devise a release model or policy on how the anonymised data will be released and to whom. Decide whether this dataset will be publicly available, or shared with controlled groups.
• Step 3: Identify the teams required within the organisation to perform anonymisation. Identify their roles and responsibilities.
• Step 4: Determine which data directly identifies an individual (direct identifiers like phone numbers, and interestingly, Aadhaar) and which data indirectly does so (quasi-identifiers like sexual orientation or religious belief). This will help decide which data should be anonymised and the techniques to do so.
• Step 5: First mask—or anonymise—direct identifiers. This keeps the dataset free from re-identification risks.
• Step 6: Conduct threat modelling for quasi-identifiers, so identify what information could be revealed as a result of them.
• Step 7: Determine the re-identification risk threshold based on the anonymisation techniques deployed.
• Step 8: Determine the anonymisation techniques for quasi-identifiers, and document the process.
• Step 9: Import sample data from the original database and document the same.
• Step 10: Based on steps 6-9, perform a trial anonymisation and assess whether the results meet risk-limitation expectations. Review and correct errors, and ensure that risk is below the re-identification threshold.
• Step 11: Now, anonymise all quasi-identifiers across the dataset.
• Step 12: Stop to evaluate the actual identification risks for the anonymised data again.
• Step 13: Compare this risk with the threshold laid out by policymakers—if it falls short, evaluate and repeat the testing.
• Step 14: Determine access controls for sharing anonymised data. Data owners should ensure that the parties they share the information with use it for a limited purpose and that it is not misused. Organisations receiving data should confirm that they will not attempt to re-identify it.
• Step 15: Document the anonymisation procedure. This will help auditors identify potential flaws in anonymisation too.
Governance (Monitoring and Compliance):
1. Recommendations for the Owner Organisation:
• The respective departments will review/ internal audit the data anonymisation activities periodically at least once a year, or earlier as and when required due to change in the requirement or implementation/ complaint(s)/ incident(s).
• The owner organisation should make the data anonymisation process a part of the IT Software Development and Maintenance process.
• The project should be audited by a third-party annually for compliance to Data Privacy, including data anonymisation, to ensure the effectiveness and correctness of the data anonymisation.
• Third-party vendors and their subcontractors should also be in the audit scope as they do have access to the data.
2. Perform a risk assessment both pre and post-release of anonymised data:
3. Data Privacy Incident Reporting:
• All data privacy incidents should be reported to the concerned stakeholders.
• Further, the stakeholders should ensure that a data privacy incident reporting system is in place for users to report any gaps in data anonymisation or any other personal information breach.
• Data Incidents should be acknowledged within a specified time frame as per limit and resolved based on the severity.
• During Internal Audits, Data Privacy Incidents should be reviewed
• Capacity building and creating awareness are essential for the successful implementation of anonymisation of data in various projects. In this section, aspects of human resource development, awareness programs are deliberated.
A copy of the Draft Guidelines is linked below for ease of reference.