Skip Navigation
Menu
Newsletters

The PIPC Releases its “Guideline on Processing Publicly Available Data for AI Development and Services”

2024.07.23

On July 17, 2024, the Personal Information Protection Commission (“PIPC”) released its Guideline on Processing Publicly Available Data for AI Development and Services (the “Guideline”) (Link).
 
In sum, the Guideline defines “publicly available data” as personal information that is legally accessible to anyone and sets forth standards for applying the “legitimate interest” basis for collecting and using publicly available data to develop AI or provide AI services. The Guideline also elaborates on standard safeguards that AI companies can implement when training AI or providing AI services, how to protect data subjects’ rights, and the internal management system of AI companies.
 

1.

Requirements and Standard for Applying “Legitimate Interest”

The Guideline explains that the “legitimate interest” provision in Article 15, Paragraph (1), Item 6 of the Personal Information Protection Act (“PIPA”) can be a practical legal basis for the collection and use of publicly available data for AI training and services. In particular, the Guideline provides the PIPC’s interpretation of the requirements for applying legitimate interest, together with some examples. According to the Guideline, the following conditions should be met to utilize “legitimate interest” basis:
 

  • Requirement 1: Legitimacy of the Purpose

Various types of interests can qualify as “legitimate interest,” including the AI company’s commercial interest or the resulting societal interest.

AI developers are recommended to define their purposes in as much detail as possible.

The level of detail may depend on whether the AI could be considered “narrow AI” or “general AI.”

Type

Level of Detail in Defining Purposes

Narrow AI

  • Recommend defining the intended purpose and use in as much detail as possible (e.g., summarizing documents, translations, image generation).

General AI

  • May describe the legitimate interest within a reasonably foreseeable scope, using the AI’s types, functionalities and capabilities.

 

  • Requirement 2: Necessity of Processing

Necessity of processing will be determined on a case by case basis, considering factors such as the detailed purpose, use and context of the AI.

The processing must be substantially related to the legitimate interest and within a reasonable scope.

The AI company should establish, in advance, standards for training data, which are appropriate for the AI’s purpose, and use and exclude from training data if the data are not substantially related to the development of the AI.
 

  • Requirement 3: Balance of Interests

The data controller’s legitimate interest must clearly outweigh data subjects’ rights.

The balancing test may consider the data controller’s implementation of safeguards to protect personal information and measures to ensure the data subjects’ rights (further discussed below).
 

2.

Implementing Safeguards and Ensuring Data Subjects’ Rights

The Guideline provides that AI companies can implement safeguards that are appropriate for their specific business and should effectively support the data subjects’ exercise of their rights in consideration of the currently available technology and limitations of AI. The Guideline presents the following examples of safeguards and means to protect the data subjects’ rights and expressly states that AI companies do not need to implement all safeguards that are discussed in the Guideline.
 

  • Technical Safeguards

Types of Measures

Details and Examples

Verification and Management of Sources of Training Data

  • Comply with website terms of use and robots.txt, and confirm major data sources when using data collected by third parties (e.g., Common Crawl).

  • Exclude from training data whose URLs are identified by the PIPC and the Korea Internet & Security Agency.

Prevention of Data Breach

  • Delete or de-identify personal identifiers (e.g., unique identification information, sensitive information, bank account information, credit card numbers).

  • Use de-duplicated datasets or use tools for de-duplication.

  • Use enhanced personal information protection technology (e.g., differential privacy).

Secure Storage and Management of Personal Information

  • Implement safeguards, such as access restrictions to prevent leakage of training data or to prevent combination of such data with user database.

Additional Safeguards Through Fine-Tuning

  • Use fine-tuning, such as supervised fine-tuning or reinforcement learning from human feedback.

Filtering of Prompts and Outputs

  • Refuse to answer or provide predetermined answers for prompts that try to induce individual profiling or privacy infringement.

  • Use technology to detect and delete personal information that may be generated in outputs.

Deleting Specific Data After Training

  • Implement technology, such as machine unlearning (depending on technological progress).

 

  • Managerial Safeguards

Types of Measures

Details and Examples

Establishing Standards for Processing Training Data and Disclosing in the Privacy Policy

  • Disclose the standards for collection and use of training data (including major collection sources, method of collection, and safeguards) in the privacy policy, technical documents, and FAQs.

Privacy Impact Assessment

  • Consider conducting privacy impact assessment under the PIPA (for AI services that can substantially impact data subjects’ rights and obligations or may include sensitive information in training data).

AI Privacy Red Teaming

  • Establish and operate red teams (including external experts) to test potential privacy infringement cases.

Safeguards Appropriate for the Method of AI Development and Distribution (e.g., Open Source, API)

  • For open source models:

(i) Establish and distribute license policies that specify the terms and conditions of use.

(ii) Promptly take responsive measures and redistribute models when vulnerabilities are discovered.

  • For API-integrated services:

(i) Impose contractual obligations on developers using the API to protect personal information and provide detailed instructions for use and technical documents.

 

  • Ensuring Data Subjects’ Rights

Types of Measures

Details and Examples

Enhanced Transparency on AI Training Data

  • Disclose the collection of publicly available data (including major sources and processing purposes) in the privacy policy, technical documents, and FAQs.

Support for Exercising Data Subjects’ Rights

  • Make efforts to respond to data subjects’ privacy requests (e.g., access, correction, deletion) in consideration of reasonable time, costs, and technology involved in responding to such requests.

 

3.

Internal Management Systems for AI Companies

The PIPC recommends that AI companies establish an AI privacy department led by the Chief Privacy Officer. In particular, the Guideline explains that this AI privacy department’s primary responsibilities may include confirming and evaluating the legal bases for personal information processing, documenting such legal bases, conducting regular monitoring of risk factors, and providing support for exercising data subjects’ rights.
 

4.

Implications

The PIPC expressly stated that the Guideline provides legal interpretations that can serve as a reference, but that the Guideline is not legally binding and that noncompliance with the measures in the Guideline would not directly lead to PIPA violations. On the other hand, the PIPC clarified that AI companies bear the burden of proving that their privacy practices comply with the PIPA. Therefore, the Guideline will still serve as an important reference for companies in understanding the PIPC’s positions and interpretations, and companies should closely monitor subsequent PIPC activities, as well as how the Guideline will be applied in practice.

 

[Korean Version]

Share

Close

Professionals

CLose

Professionals

CLose