logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-08152025-125738


Tipo di tesi
Tesi di laurea magistrale
Autore
MALIKOVA, RAFIGA
URN
etd-08152025-125738
Titolo
The Right to Be Forgotten in the Age of Large Language Models: Legal and Technical Feasibility of Data Erasure from the EU and US perspectives
Dipartimento
GIURISPRUDENZA
Corso di studi
DIRITTO DELL'INNOVAZIONE PER L'IMPRESA E LE ISTITUZIONI
Relatori
relatore Passaglia, Paolo
Parole chiave
  • AI governance
  • algorithmic accountability
  • artificial intelligence
  • CCPA
  • challenges in data deletion
  • comparative data protection
  • cross-border data regulation
  • data anonymization
  • data deletion techniques
  • data erasure
  • data protection law
  • data subject rights
  • digital rights
  • ethical AI
  • EU vs US approach
  • European Union
  • feasibility of RTBF
  • GDPR
  • information lifecycle
  • international privacy frameworks
  • large language models
  • legal compliance
  • machine learning
  • model fine-tuning
  • model retraining
  • personal data management
  • privacy law
  • right to be forgotten
  • transatlantic data regulation
  • transparency in AI
  • US privacy law
Data inizio appello
15/09/2025
Consultabilità
Non consultabile
Data di rilascio
15/09/2065
Riassunto
In a world where digital footprints shape how we are perceived, remembered, or even misjudged, the question of whether one can truly be "forgotten" by artificial intelligence (AI) is becoming increasingly critical. The Right to Be Forgotten (RTBF), also referred to as the right to erasure, empowers individuals to request the deletion of their personal data when it is no longer necessary or lawfully processed. This right, legally codified in Article 17 of the General Data Protection Regulation (GDPR), traces its origins to the landmark judgment of the Court of Justice of the European Union (CJEU) in the 2014 Google Spain SL v AEPD case, which required search engines to delist outdated or irrelevant links tied to an individual’s name.
Since the implementation of the GDPR in 2018, the RTBF has become a cornerstone of modern European data protection law, symbolizing the EU’s strong emphasis on human dignity and autonomy in the digital age. However, the emergence of AI-powered systems, particularly Large Language Models (LLMs) such as ChatGPT, Gemini, or Claude, has made this right increasingly difficult to enforce in practice. Unlike traditional search engines or structured databases, LLMs absorb data during training and transform it into abstract mathematical patterns, which are not easily reversible or traceable. This presents unique challenges, both legal and technical, when individuals attempt to invoke their right to be forgotten.
When the RTBF was first recognized, the digital landscape was primarily structured around platforms that indexed or stored data in a relatively transparent and accessible way. Erasure, while not always straightforward, was technically feasible through the removal or delisting of URLs and personal records. However, the shift from symbolic, rule-based systems to data-driven, generative AI models has disrupted this logic. LLMs do not store personal data as discrete entries. Instead, they generalize from massive datasets to "learn" linguistic patterns and generate coherent responses, often unintentionally memorizing or reproducing personal data that was included during training.
This poses a critical question: Can an AI system truly forget what it has learned if it was never designed to remember in the first place? Despite the abstraction, there is growing evidence that LLMs can regenerate personal information or hallucinate sensitive details based on their training data. These behaviors pose a significant compliance risk for developers, especially under strict EU privacy rules.
The legal foundations of the RTBF are well-established in the GDPR, where individuals are entitled to have their personal data erased under several conditions, including withdrawal of consent, unlawful processing, or the data no longer being necessary for its original purpose (GDPR, art 17). Internationally, however, the interpretation and implementation of the RTBF vary significantly. For instance, while the California Consumer Privacy Act (CCPA) recognizes a form of data deletion, its scope and enforcement mechanisms are notably less stringent compared to the GDPR. This discrepancy creates regulatory fragmentation, complicating compliance for globally operating tech companies that must navigate competing legal obligations across jurisdictions.
Beyond the legal obligations, the RTBF raises profound ethical concerns. Balancing privacy and data protection against freedom of expression, research interests, and technological innovation is an ongoing dilemma. In particular, privacy advocates argue that individuals should have control over their digital identities, while technologists warn that full erasure may not always be feasible or safe when it affects critical infrastructure or learning systems. Moreover, failing to respect RTBF rights could result in discrimination, misuse of outdated information, or even reputational harm.
From a technical perspective, one of the most pressing challenges is the irreversibility of data integration in LLMs. When data is used to train an LLM, it becomes part of the model’s underlying weight structure making it almost impossible to surgically remove without retraining the model entirely. Furthermore, LLMs are prone to hallucinations, the generation of plausible but fabricated outputs, which can include reconstructed fragments of personal data, even if the original record was deleted. This phenomenon is often driven by the statistical nature of the model, which generates responses based on learned correlations rather than strict memory retrieval.
The risk is not only theoretical. Several studies and audits have demonstrated that membership inference attacks and data reconstruction techniques can extract sensitive personal information from LLMs, highlighting a serious vulnerability for GDPR compliance. These technical limitations necessitate innovative mitigation approaches, ranging from machine unlearning —a process designed to remove the influence of specific data on a trained model —to differential privacy, which introduces noise during training to prevent the memorisation of individual data points.
The EU has taken steps to respond to these emerging threats with the adoption of the EU AI Act, a pioneering legislative framework intended to regulate AI systems based on risk levels. While the Act does not explicitly refer to the RTBF, it reinforces GDPR obligations, particularly regarding data governance, transparency, and accountability across the entire AI lifecycle. High-risk systems are required to perform ongoing risk management, maintain documentation, and ensure that users can understand and challenge the use of their data.
The AI Act also encourages developers to design systems with privacy-by-design and privacy-by-default principles, potentially creating space for RTBF-aware architectures. However, critics argue that the Act does not go far enough to specify technical standards for machine unlearning or data deletion from LLMs, leaving many open questions around enforcement and feasibility.
While academic, legal, and technical debates around RTBF and LLMs are gaining momentum, there remains a significant research gap. Much of the current literature either focuses on legal theory without addressing technical feasibility or dives into machine learning innovations without translating them into normative legal frameworks. This thesis aims to fill that gap by offering an interdisciplinary analysis that connects legal obligations under Article 17 GDPR with technological developments in LLMs and the evolving regulatory ecosystem.
File