July 22, 2025, marks a decisive turning point for the French artificial intelligence ecosystem. The French Data Protection Authority (Commission nationale de l’informatique et des libertés – CNIL) has just published its latest recommendations on AI system development, thus concluding a cycle of work begun in May 2023 with its “AI plan.” This finalization comes in a particularly dense European context, where the European Regulation on Artificial Intelligence (AI Act) now coexists with the General Data Protection Regulation (GDPR), creating an unprecedented legal framework for AI actors. The fundamental question remains: will these new prescriptions constitute a brake on innovation or, on the contrary, the necessary foundation for responsible and sustainable technological development?
I. The Reinforced Legal Framework: Articulation Between GDPR and AI Act
1.1 GDPR Applicability to AI Models: A Decisive Clarification
The opinion adopted by the European Data Protection Board (EDPB) in December 2024 recalls that the GDPR applies, in many cases, to AI models trained on personal data due to their memorization capabilities. This position, now taken up and clarified by the CNIL, constitutes a major evolution of French doctrine.
The CNIL now guides actors in conducting and documenting the analysis to determine whether the use of their model is subject to the GDPR. This methodological approach is part of a privacy by design logic, a principle enshrined in Article 25 of the GDPR of April 27, 2016.
Concrete measure: The CNIL proposes technical solutions to avoid personal data processing, notably the implementation of robust filters at the level of the system encapsulating the model. This recommendation proves particularly relevant for foundation models and generative AI systems.
1.2 The Complex Articulation with the European Regulation on AI
When personal data is used for AI system development, both the GDPR and the AI regulation apply. This normative coexistence, far from being redundant, creates a two-level regulatory environment:
Horizontal level: The GDPR applies transversally as soon as personal data processing is identified
Vertical level: The AI Act establishes specific obligations according to AI system classification (minimal, limited, high, unacceptable risk)
This dual regulation requires a coordinated compliance approach, particularly complex for actors developing high-risk AI systems within the meaning of Annex III of the AI Act.
II. Specific Recommendations: Detailed Analysis of CNIL Measures
2.1 Purpose Definition: Cornerstone of Compliance
An AI system based on personal data exploitation must be developed with a “purpose.” This requirement, which stems directly from Article 5.1.b) of the GDPR, takes on particular importance in the context of generative AI.
Legal issue: Defining purposes sufficiently precise for general purpose models constitutes a major conceptual challenge. How to reconcile the sought versatility of these systems with the requirement to specify purposes? The CNIL proposes a pragmatic approach distinguishing:
Model development purposes (training, validation, testing)
Operational use purposes (specific applications powered by the model)
2.2 Responsibility Determination in the AI Value Chain
The question of responsibilities constitutes one of the most complex aspects of the current legal framework. The CNIL announces for the second half of 2025 new recommendations to enlighten AI system creation chain actors (model designers, hosts, reusers, integrators, etc.) on their responsibilities regarding the GDPR.
Prospective analysis: This approach is part of a reinforced accountability logic, a fundamental principle of the GDPR. The precise identification of roles (data controller, processor, recipient) in the AI ecosystem will clarify each actor’s obligations, particularly crucial in the open source context.
2.3 Legal Bases: Legitimate Interest Under Surveillance
The mobilization of legitimate interest (Article 6.1.f) of the GDPR) as a legal basis for AI system development is subject to particular attention from the CNIL. The CNIL continues work within the European Data Protection Board (EDPB) on the articulation between the GDPR and the AI Act as well as on data harvesting in the context of generative AI.
Practical implications: Massive web scraping, a technique commonly used to build training bases, requires robust justification of legitimate interest. The CNIL requires rigorous proportionality analysis, integrating:
Assessment of impact on data subjects’ rights
Balancing with interests pursued by the developer
Existence of appropriate protective measures
2.4 Data Minimization: Technical and Legal Challenges
The minimization principle (Article 5.1.c) of the GDPR) finds particularly delicate application in AI system development. The CNIL details the risks and measures to consider during AI system development to enable AI system development in a secure environment.
Technical recommendations:
Ex ante anonymization of training datasets
Differential privacy techniques to preserve confidentiality
Data filtering and preprocessing to eliminate sensitive information
III. Operational Compliance Guide: Decrypting the CNIL Verification List 📋
The CNIL proposes a methodical list of eleven verification measures, constituting a true compliance framework for AI system developers. Each measure responds to specific legal imperatives while raising considerable practical challenges for corporate lawyers.
3.1 Measure 1: Legal Regime and Responsibility Determination
Objective sought: Establish legal qualification of processing and responsibility allocation among AI value chain actors.
Operational verifications:
Identification of personal data presence in training base, including those from web scraping
Analysis of GDPR applicability to the learned model, including conducting re-identification attacks
Assessment of personal data extraction likelihood by typology
Implementation of regular anonymity character re-evaluation process
Legal impact for practitioners: This measure imposes a conceptual revolution in legal qualification approach. Lawyers must now master concepts of adversarial attacks and model memorization. Documentation of these technical analyses becomes a crucial probationary imperative in case of control. Actor accountability requires complete contractual overhaul, particularly in AI subcontracting relationships.
3.2 Measure 2: Purpose Definition and Legal Basis Choice
Objective sought: Operationalize lawfulness and purpose limitation principles in the specific context of generative AI.
Operational verifications:
Purpose clarification from design phase, with reference to developed system type for general AI
Precise identification of legal bases for each processing
Documentation of consent collection methods with probationary conservation
Contractual validation for processing based on contract execution
Legal impact for practitioners: Purpose definition for generative AI constitutes a major conceptual challenge. How to reconcile sought versatility with specification requirement? Lawyers must develop a multi-layer approach distinguishing development purposes and use purposes. Probationary conservation of consent in distributed environments raises complex technical questions requiring close collaboration with IT teams.
3.3 Measure 3: Legitimate Interest Assessment
Objective sought: Structure proportionality analysis required by Article 6.1.f) of the GDPR in the high-tech context of AI.
Operational verifications:
Clear definition of pursued interest with regulatory compatibility verification
Assessment of processing technical necessity, including analysis of less intrusive alternatives
Justification of algorithmic choice least consuming personal data
Documented balancing between interests and reasonable expectations of data subjects
Implementation of specific guarantees for web scraping
Legal impact for practitioners: Legitimate interest becomes a highly technical exercise requiring multidisciplinary expertise. Lawyers must be able to evaluate the relevance of architectures like federated learning or secure multiparty computation. Documentation of technical choices becomes an essential probationary element. Implementation of discretionary opposition rights raises complex technical implementation questions.
3.4 Measure 4: Data Reuse – Reinforced Compatibility Test
Objective sought: Frame existing data reuse according to their origin and initial lawfulness.
Operational verifications:
Application of compatibility test in case of purpose change for own data
Verification of manifest lawfulness of third-party databases
Documentation of collection conditions and source identification
Reinforced controls for sensitive or offense data
Legal impact for practitioners: The compatibility test requires thorough contextual analysis often neglected. Lawyers must develop a specific analysis grid integrating AI specificities. Due diligence on third-party databases imposes an unprecedented level of vigilance, particularly critical in the data broker ecosystem. Implications in terms of civil and criminal liability are considerable.
3.5 Measure 5: Data Minimization – Beyond the Principle
Objective sought: Operationalize the minimization principle in a context where data volumes traditionally constitute a competitive advantage.
Operational verifications:
Selection of strictly necessary data with volumetric and temporal justification
Assessment of synthetic, pseudonymized or anonymized data usage
Specific web scraping measures with proactive exclusion of sensitive categories
Collection organization with cleaning and relevant data identification
Continuous re-evaluation of collected data relevance
Legal impact for practitioners: Minimization in AI context requires a dynamic and prospective approach. Lawyers must support the implementation of continuous review processes, raising complex internal governance questions. Site exclusion via robots.txt raises legal opposability questions still debated. Justification of massive volume “necessity” requires thorough technical expertise.
3.6 Measure 6: Retention Periods – Complete Lifecycle Management
Objective sought: Adapt the retention limitation principle to multi-phase AI development specificities.
Operational verifications:
Definition of specific durations by AI lifecycle phase
Archiving or deletion process at development end
Documentation of retention necessity for maintenance and improvement
Automatic deletion plan post-improvement
Legal impact for practitioners: Retention period management in AI environment requires an unprecedented granular approach. Lawyers must design retention policies adapted to iterative development cycles. Justification of retention for continuous improvement raises delicate proportionality questions. Deletion automation imposes major technical constraints.
3.7 Measure 7: Transparency – Information Adapted to AI Specificities
Objective sought: Adapt information obligations of Articles 13 and 14 GDPR to generative AI complexities.
Operational verifications:
Complete information according to Articles 13 and 14 GDPR
Documentation of disproportionate efforts for indirect information
Precise information on web scraping sources or risk categorization
Information on data memorized by the model
Legal impact for practitioners: Information in generative AI context requires a delicate balance between exhaustiveness and comprehensibility. Lawyers must develop layered information strategies adapted to target audience. The notion of disproportionate efforts finds particular application in massive web scraping context. Communication on model memorization raises major technical vulgarization challenges.
3.8 Measure 8: Individual Rights – Operational Revolution
Objective sought: Make GDPR rights effective in a technological environment initially not designed for their exercise.
Operational verifications:
Information on regurgitation risks and recourse mechanisms
Notification procedures to recipients of exercise requests
Establishment of model interrogation procedures for identification
Choice of technical solutions favoring retraining
Implementation of robust filters if retraining disproportionate
Legal impact for practitioners: Rights exercise in AI context probably constitutes the most complex challenge for lawyers. Periodic retraining raises major cost and contractual liability questions. Implementation of identification procedures requires thorough technical expertise. Documentation of retraining’s disproportionate character requires rigorous economic analysis.
3.9 Measure 9: Annotation Compliance – Misunderstood Critical Process
Objective sought: Frame the annotation phase as a full personal data processing.
Operational verifications:
Verification of annotation necessity and objectivity
Implementation of regular relevance reviews
Annotation protocol compliant with accuracy and minimization principles
Inclusion of annotation in rights management procedures
Training of annotators in data protection principles
Legal impact for practitioners: Annotation often constitutes a blind spot of AI compliance. Lawyers must sensitize technical teams to the legal qualification of this phase. Annotation subcontracting raises specific contractual questions. Annotator training requires a sensitization program adapted to GDPR issues.
3.10 Measure 10: Data Security – Holistic Approach
Objective sought: Adapt Article 32 GDPR security requirements to AI development environment specificities.
Operational verifications:
Training data security according to CNIL guide
Development security with tool and library verification
System operation framing with output control
Authorization management and access traceability
Security action plan with continuous monitoring
Legal impact for practitioners: Security in AI environment requires a systemic approach beyond traditional data protection. Lawyers must master watermarking and output filtering concepts. Pre-trained model verification raises chain liability questions. Security action plan requires governance adapted to agile development cycles.
3.11 Measure 11: Specialized DPIA – AI Risk Integration
Objective sought: Adapt impact analysis methodology to artificial intelligence specific risks.
Operational verifications:
DPIA realization according to EDPB criteria adapted to AI
Inclusion of AI specific risks (discrimination, fictitious content, adversarial attacks)
Taking adequate mitigation measures
Legal impact for practitioners: DPIA in AI context requires enriched methodology integrating unprecedented technological risks. Lawyers must develop a specific analysis grid for algorithmic biases and adversarial attacks. Mitigation measures documentation requires close collaboration with technical teams. Societal impact assessment exceeds the traditional data protection framework.
IV. Data Annotation: A Critical Process Under Framework
4.1 Annotation Process Compliance
The training data annotation phase is decisive to guarantee trained model quality and person rights protection, while developing more reliable and efficient AI systems. This recognition of annotation as a critical step constitutes a notable advance.
Compliance issues:
Annotator training in data protection issues
Traceability of personal data interventions
Quality control integrating GDPR requirements
4.2 Annotation Subcontracting: Risks and Precautions
Externalization of annotation tasks, frequently practiced via crowdsourcing platforms, raises specific GDPR compliance questions. Subcontracting contracts (Article 28 GDPR) must integrate specific clauses relating to annotation activities.
V. Development Security: Technical and Organizational Imperatives
5.1 AI-Specific Security Measures
The CNIL devotes an entire practical sheet to AI system development security, thus recognizing the specificity of risks inherent to these technologies. This approach articulates with Article 32 GDPR requirements relating to processing security.
Recommended measures:
Training database encryption
Granular access control to development environments
Training phase audit and monitoring
Protection against adversarial attacks
5.2 Security Governance in the AI Ecosystem
The CNIL’s approach goes beyond the purely technical dimension to integrate an organizational dimension. Security governance must be adapted to AI development specificities, notably in terms of:
Model version management
Modification traceability
Specific incident response procedures
VI. Innovation Versus Compliance: Critical Analysis of CNIL Positioning
6.1 An Innovation-Facilitating Framework?
Contrary to initial criticisms, the CNIL’s approach seems to be part of an innovation facilitation logic. The received idea that the GDPR would prevent artificial intelligence innovation in Europe is false. This principled position translates into:
Support tools:
Summary sheet for rapid appropriation
Operational verification checklist
Systematic public consultation before publication
6.2 Potential Limits of the Proposed Framework
Despite this pragmatic approach, certain questions remain:
Regulatory complexity: The GDPR/AI Act superposition could create a compliance burden dissuasive for French AI SMEs and startups.
International competitiveness: Regulatory asymmetry with American and Chinese ecosystems raises long-term competitiveness questions.
Disruptive innovation: Are current frameworks flexible enough to accompany upcoming technological breakthroughs (quantum, neuromorphic AI)?
VII. Perspectives and Future Work: Towards Sectoral Regulation
7.1 Announced Sectoral Approach
Faced with AI usage context diversity, the CNIL develops targeted recommendations by sector, to legally secure actors and favor rights-respecting AI.
Identified priority sectors:
Education: The CNIL recently published two frequently asked questions (FAQ) intended for teachers and processing controllers. This pedagogical approach aims to democratize educational AI use while preserving student rights.
Health: Collaboration with the High Authority for Health testifies to a sectoral inter-regulation will. Medical confidentiality and clinical responsibility issues require a specific framework.
Work: HR implications of AI (recruitment, evaluation, surveillance) constitute a particularly sensitive GDPR application field.
7.2 Technical Developments: The PANAME Project
She thus launched the PANAME partnership project (Privacy AuditiNg of Ai ModEls) with the National Agency for Information Systems Security (ANSSI), the iPoP priority research program and equipment (Interdisciplinary Project on Privacy) and the Digital Regulation Expertise Center (PEReN).
Strategic objective: Develop a software library for AI model compliance assessment. This initiative illustrates a RegTech approach to compliance, potentially transformative for the ecosystem.
7.3 Explainability Research (xAI)
The CNIL will soon publish, on its LINC laboratory site, the first results of this project on AI model explainability. This applied research approach aims to operationalize the “right to explanation” in the AI context.
VIII. European Issues and International Coordination
8.1 Ongoing European Harmonization
Furthermore, the CNIL continues work within the European Data Protection Board (EDPB) on the articulation between the GDPR and the AI Act as well as on data harvesting in the context of generative AI. This European coordination aims to avoid regulatory fragmentation within the digital single market.
8.2 European Competitive Positioning
The European approach to AI regulation, embodied by the French CNIL position, is part of a broader geopolitical strategy of “digital sovereignty.” The central question remains whether this anticipated regulation will constitute a competitive advantage (first mover advantage) or a handicap against less regulated ecosystems.
Conclusion: Towards European “by Design” AI? 🇪🇺
The finalization of CNIL recommendations marks the completion of a reflection cycle begun nearly two years ago. The retained approach favors pedagogy and support rather than sanction, testifying to certain regulatory maturity.
Achievements assessment:
Clarification of GDPR applicability to AI models
Operationalization of data protection principles in AI context
Practical tools available to professionals
Differentiated sectoral approach
Persistent challenges:
Double GDPR/AI Act regulation complexity
Adaptation to rapid technological evolutions
Maintenance of European competitiveness
Work announced for the 2025-2028 period (actor responsibilities, technical tools, xAI research) suggests continuous regulatory framework scaling up. This evolution is however accompanied by a co-construction effort with economic actors, guarantee of norm acceptability and effectiveness.
The fundamental issue remains transforming this regulatory constraint into competitive advantage for the European AI ecosystem. The success of this alchemy will determine whether Europe will manage to impose its “trustworthy AI” model against global tech giants, or whether it will remain confined to the role of “global regulator” without proper industrial champions.
Prospective opening: The upcoming emergence of new AI generations (multimodal, reasoning, autonomous agents) will test the current regulatory framework’s adaptation capacity. The CNIL will have to maintain a delicate balance between legal stability and regulatory agility, permanent challenge of any technological regulation in the era of digital acceleration.
Sources :



