On July 17, 2024, the Personal Information Protection Commission (“PIPC”) released its Guideline on Processing Publicly Available Data for AI Development and Services (the “Guideline”) (Link).
In sum, the Guideline defines “publicly available data” as personal information that is legally accessible to anyone and sets forth standards for applying the “legitimate interest” basis for collecting and using publicly available data to develop AI or provide AI services. The Guideline also elaborates on standard safeguards that AI companies can implement when training AI or providing AI services, how to protect data subjects’ rights, and the internal management system of AI companies.
1. |
Requirements and Standard for Applying “Legitimate Interest” |
-
Requirement 1: Legitimacy of the Purpose
– |
Various types of interests can qualify as “legitimate interest,” including the AI company’s commercial interest or the resulting societal interest. |
– |
AI developers are recommended to define their purposes in as much detail as possible. |
– |
The level of detail may depend on whether the AI could be considered “narrow AI” or “general AI.” |
Type |
Level of Detail in Defining Purposes |
Narrow AI |
|
General AI |
|
-
Requirement 2: Necessity of Processing
– |
Necessity of processing will be determined on a case by case basis, considering factors such as the detailed purpose, use and context of the AI. |
– |
The processing must be substantially related to the legitimate interest and within a reasonable scope. |
– |
The AI company should establish, in advance, standards for training data, which are appropriate for the AI’s purpose, and use and exclude from training data if the data are not substantially related to the development of the AI. |
-
Requirement 3: Balance of Interests
– |
The data controller’s legitimate interest must clearly outweigh data subjects’ rights. |
– |
The balancing test may consider the data controller’s implementation of safeguards to protect personal information and measures to ensure the data subjects’ rights (further discussed below). |
2. |
Implementing Safeguards and Ensuring Data Subjects’ Rights |
-
Technical Safeguards
Types of Measures |
Details and Examples |
Verification and Management of Sources of Training Data |
|
Prevention of Data Breach |
|
Secure Storage and Management of Personal Information |
|
Additional Safeguards Through Fine-Tuning |
|
Filtering of Prompts and Outputs |
|
Deleting Specific Data After Training |
|
-
Managerial Safeguards
Types of Measures |
Details and Examples |
Establishing Standards for Processing Training Data and Disclosing in the Privacy Policy |
|
Privacy Impact Assessment |
|
AI Privacy Red Teaming |
|
Safeguards Appropriate for the Method of AI Development and Distribution (e.g., Open Source, API) |
(i) Establish and distribute license policies that specify the terms and conditions of use. (ii) Promptly take responsive measures and redistribute models when vulnerabilities are discovered.
(i) Impose contractual obligations on developers using the API to protect personal information and provide detailed instructions for use and technical documents. |
-
Ensuring Data Subjects’ Rights
Types of Measures |
Details and Examples |
Enhanced Transparency on AI Training Data |
|
Support for Exercising Data Subjects’ Rights |
|
3. |
Internal Management Systems for AI Companies |
4. |
Implications |
Related Topics