Can LLMs Unlearn? Part 1: Predictable Law, Unpredictable Machines

Part 1 in a multi-part series on the problems of deterministic laws in an increasingly non-deterministic world.

Jun 09, 2024

For those of you who are following along, this has been a many months-long quest to discover whether or not large language model (LLM) providers can comply with data subject rights like the right to erasure & the right to be forgotten (RTBF) and whether LLMs generally can unlearn. To make life simpler for everyone, I’ve created some breadcrumbs.

In Part 1, I lay out the problem: LLMs are not databases, and yet our laws expect them to behave like databases. Data protection laws in particular, assume that the things that hold true for databases should be as easy to do for a non-deterministic systems like LLMs. I begin to explain why this is hard.

Can LLMs Unlearn? Part 1: Predictable Law, Unpredictable Machines

Carey Lening

June 9, 2024

Read full story

In Part 2, I start to dig into one reason this is so hard —what I call the ‘underpants gnomes/Which Napoleon’ problem. I ask the key question of how to identify a specific individual within a system in order to erase or correct information about that individual. I also touch on how applying certain tools during the initial training process can help, such as Named Entity Resolution, Named Entity Disambiguation, Linking & Knowledge Graphs, but how that’s also really costly which is probably why no one is doing it.And Part 2 here:

Can LLMs Unlearn? Part 2: Needle in a Neural Network - In Order to Forget You, First You Must Be Found

Carey Lening

July 25, 2024

Read full story

Part 2.5 discusses my ongoing efforts to have OpenAI (and less successfully, Perplexity) forget my data. OpenAI kinda tries, but it never actually erases data about me. It just applies a rather blunt-force hack to the problem. I discuss whether this is sufficient and potential harms from this approach:

Can LLMs Unlearn? Part 2.5: Where OpenAI Sorta Maybe Deletes My Data

Carey Lening

August 14, 2024

Read full story

In Part 3, I start to explore the research on machine training and unlearning. I define what machine unlearning / forgetting is, and isn’t, and address the techniques researchers have discovered. I also explain why the gold-standard approach of starting with perfect data or full retraining doesn’t scale well, and why other exact unlearning methods are also difficult to do in practice:

Can LLMs Unlearn? Part 3: The Technical Complexity of it All

Carey Lening

November 14, 2024

Read full story

Part 4 continues the technical discussion. In this section I touch on approximate unlearning methods, their strengths and limitations. I also discuss adjacent unlearning techniques like differentially private unlearning and federated unlearning:

Can LLMs Unlearn? Part 4: The Technical Complexity of it All (continued)

Carey Lening

November 14, 2024

Read full story

In Part 5, I will cover the final bits, including suppression methods & guardrails — essentially what OpenAI does to ‘comply’ with data protection laws like the GDPR. I will also close with my thoughts on the state of the law, and how to reconcile the incongruence between technical complexity and legal certainty.

You are here: Part 1 … Predictable Laws, Unpredictable Machines

Apologies for the delay between posts. It turns out that socializing non-stop for two days straight saps every iota of energy right out of me. Still, I don’t regret attending the IAPP AI Governance Global event in Brussels — I met so many amazing people, dished about data protection, and learned a lot, mostly about pain points and how we’re all blindly trying to identify and describe the elephant that is AI regulatory policy.1

But even before the conference, I’ve had AI on the brain. Between the release of the EDPB’s ChatGPT Taskforce report, and a recent query by Jared Browne on why OpenAI can’t correct personal data, I’ve been thinking a lot about the legal implications of large language models:

I said 'correct my data', not 'delete my data'. Have you read Open AI's privacy policy lately? I did, and I'm not quite sure what's going on? I know the below extract is meant to be reassuring; but it's just that it's not. In the highlighted text, are OpenAI not effectively admitting non-compliance with the GDPR, effectively saying that 'we didn't build this technology with the right to rectification under Article 16 in mind'? And the solution? We'll just delete your record if we can't correct it. The problem here is that a person has asked to have their data corrected, not deleted. Presumably, for example, if someone is running an online business, they very much want to be found by users. They precisely don't want their record deleted. They just want it corrected, so that that the inaccurate record does not lead to them being confused with other people, and damaging their livelihood. This is why a separate right to rectification exists: it has an important role in controlling how my data is used, and on how it affects me. And interestingly, the EDPB's interim report on ChatGPT, published last Friday, states that 'technical impossibility' is not an excuse for non-compliance. So, we are heading for a showdown then, and it will be very revealing to see what the final report will say.

As these types of questions usually do, I started thinking about other, related questions, like deletion, accuracy, and technical impossibility. I also phoned a lot of friends about how compliance could be done at a technical level, which yielded over 100 comments, tons of insight, and a realization that like so many things, everything is more fractally complex than most of us realize. And because my brain is full of holes, I started writing down what I learned (which will be part of a future post).

But before I dig into the how, I wanted to touch on the bigger problem of applying deterministic laws to non-deterministic systems.

LLMs How do They Work?

The ICP "Miracles" meme never disappoints.

Most of us know what a LLM is, and so I will skip some of the basics. But for purposes of this piece, it’s good to remember that at its core, an LLM is:

a sophisticated pattern-completion program
trained on gobs of data
in order to generate coherent text
based on statistical relationships.

Large language models use training data (like the Common Crawl) to learn what words are most likely to appear next to other words.2 It then assigns words (or parts of words, word stems, or other characters) to a token. For example, here’s how OpenAI breaks down the phrase “Carey Lening is a data protection consultant based in Dublin, Ireland:

The phrase 'Carey Lening is a data protection consultant based in Dublin, Ireland' is broken down into various words, word stems (like Care), letters and syntax. — You can play around with OpenAI’s token generator here.

You might notice that tokens don’t always translate to words. Carey for example, is a combo of ‘Care’ and ‘y’ which makes some sense – ‘Care’ is a common word, and it’s less common to append something like -y, -s, or -t) to that. More common names like Bob or David don’t do that stemming thing because they’re already common.

After initial training, an LLM usually needs further training, because its initial word associations are frequently hot garbage. This is most commonly done via supervised fine-tuning by humans or through reinforcement learning from human feedback (RLHF), where humans either supply correct question-answer pairs, or reward ‘correct’ model outputs generated by the LLM itself.3 The fine-tuning process may also include setting limiting instructions, weights & temperatures, filters, or providing guidance on what types of answers should, and should not be provided by the LLM. The ‘what not to include’ part is sometimes referred to as suppression.

What’s really important to remember though, is that even after all that human involvement and training, an LLM doesn’t know or understand what it’s saying. As Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell4 reminded us:

Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

Since a model is only as good as the source data it’s trained on and its fine-tuning, if the training data is junk, biased, or contains illegal materials, the model will be influenced by those biases, errors, and junk. But even if the data is quality stuff, sometimes the outputs might nonetheless be entirely wrong or incorrect in context. Human effort takes time, and the world is a bad place. No amount of supervised fine-tuning or RLHF can catch everything. And that doesn’t include all the other reasons related to the model architecture, questionable training processes, task dependency, or intentionally-motivated attacks that can muck things up.

The Complexities of Non-Determinism

Under the GDPR, and other data protection laws, individuals often have numerous rights related to their personal data, including access rights, the right to correct (rectify) inaccurate data, or to have personal data deleted. These rights aren’t absolute, but they are usually quite broad.

The thing is, these rights have been codified when data consisted of stuff about people located on servers, in a structured database, or stored as part of a “filing system”. To borrow a phrase from computer science, filing systems act ‘deterministically’ – that is, they will produce the same output (or file, or query response) given the same input.5 In other words, that funny cat meme you saved in C;/Downloads as cat.jpg will be the same cat picture tomorrow. And you’ll be able to retrieve it today, tomorrow, or next year provided nothing changes.

I have no idea what I'm doing cat computer. — Cat memes are the best.

Privacy and data protection laws, including the GDPR, were crafted with determinism in mind (even though they don’t specifically call this out). This is why the GDPR’s material scope is limited to the processing of personal data ‘wholly or partly by automated means,’ and to non-automated processing where the data forms ‘part of a filing system’ or is intended to do so.6 If it weren’t, we’d all need to spout privacy notices every time we talked shit about our colleagues at the big office party, or a comedian roasted someone as part of their set. And don’t even get me started on the lawful basis question.7

But things get way more tricky when it comes to LLMs, precisely because the same input doesn’t guarantee the same output. Their results—the probabilities of those token pairs—may change depending on the specific word choice in the prompt given, model versioning, math, or tons of other factors not entirely understood. Searching for cat.jpg on ChatGPT won’t give you the same cat.jpg.

It gives you something more like this:

Three different and distinct images of cats at a desk, in varying states of cat-ness. Two of the memes have text ('Just Pretending to Work Hard') while the third does not. The original source image actually said 'I have no idea what I'm doing' so... — Three different images, based on a single source image, using DALL-E on GPT-4.

In other words, LLMs are non-deterministic systems. Or as someone more clever than me noted, an LLM is like baking a cake. It's easy to add the ingredients in the first place (training, fine-tuning), but taking them after the cake has been baked? Good luck with that.

Importantly, unlike a filing system, there usually isn’t a single source file or even a set of files the model is calling on when it’s generating text, summarizing a paper, or creating an image.8 That means it may be hard or even impossible to point to how the LLM arrived at an incorrect, defamatory, infringing, or illegal result. Sure, the result might be influenced by some underlying source(s), but it also might include completely benign details from other training data where the token math led to a ‘better’ match.

For example, if I ask ‘Who is Bob Smith’, the LLM doesn’t know if I’m asking about Bob Smith at 333 Cherry Lane, Dublin, Ohio, or Bob Smith in Dublin, Ireland. Hell, even if I specify which Bob Smith, it still might conflate the two Bobs, even if the underlying training data is accurate about each respective Bob. Remember: LLMs lack context. They don’t know or understand what they’re saying. How exactly do you correct that? For which Bob? And what happens when Bob Smith refers to any number of people? Which ingredient do you remove from the cake?

Here’s an example in action: I asked OpenAI GPT-4o and GPT-4 for details about “Carey Lening” and received two wildly and contradictory results. Just like the cat meme.

The caption is quite large, and includes an internet search of my name. — GPT-4o’s response to “What do you know about Carey Lening” (which includes an internet search)

GPT-4 basically says 'I have no idea who that Carey Lening person is' despite giving a similar result to GPT-4o in the past. — The same prompt to GPT-4 (I don’t think it did an internet search for whatever reason), which itself is weird because GPT-4 *can* do internet searches.

So, to Jared’s question, I ask: How exactly do you correct, or rectify that? Which answer gets corrected? And what if it changes tomorrow as someone else writes a similar(ish) query about Carey Lening?

Unforgetting Machines

Given that brief background, it should be easier now to understand why something like rectification/correction is far from simple or binary. To quotesage privacy elder Michelle Dennedy, this represents a ‘wicked problem’.

Okay, you might be asking, what about deletion? After all, OpenAI claims that while it cannot correct inaccurate data, it can delete it:

A note about accuracy: Services like ChatGPT generate responses by reading a user’s request and, in response, predicting the words most likely to appear next. In some cases, the words most likely to appear next may not be the most factually accurate. For this reason, you should not rely on the factual accuracy of output from our models. If you notice that ChatGPT output contains factually inaccurate information about you and you would like us to correct the inaccuracy, you may submit a correction request through privacy.openai.com(opens in a new window) or to dsar@openai.com. Given the technical complexity of how our models work, we may not be able to correct the inaccuracy in every instance. In that case, you may request that we remove your Personal Information from ChatGPT’s output by filling out this form(opens in a new window). — **OpenAI’s Privacy Policy: https://openai.com/policies/eu-privacy-policy/**

But this isn’t entirely true either. You’ll note that what OpenAI (and likely other large language models like ~~Perplexity.ai~~, Alphabet’s Gemini, and Anthropic’s Claude) actually claim is more akin to suppression, not true deletion.

Deletion (and rectification, access, objection, hell, even accuracy and data minimization) are all wicked problems because when it comes to LLM-generated outputs, there usually isn’t a single thing to delete, correct, access, or object to, because the outputs themselves are based on the probability of certain words/tokens, and that probability is independent of context or meaning. What we have is the same baked cake problem I mentioned earlier.

Yes, you can delete information related to ‘Bob Smith’ from the training data. OpenAI, Google, and Anthropic make such an option available to individuals who request it, at least if you’re in certain jurisdictions. But it’s likely to be all Bob Smiths, and importantly, it’s most likely to be suppression (akin to what Google does for a right to be forgotten request), not actual deletion.

In Part 2, I’ll dig into all the varied methods of what is known in the literature as machine unlearning — the respective utility, challenges, and feasibility of each, and possibly, why laws around data protection and even the new EU AI Act probably need to address the technical complexity of applying deterministic law to non-deterministic systems.

Because it isn’t just AI we’re talking about here. Non-determinism is going to bite our legal system in the ass in a bunch of different ways.

Thank you for reading Privacat Insights. This post is public so feel free to share it.

Leave a comment

—

Massive thanks to the following folks (and so many others) who helped guide some of the questions — and answers — in this post.

Jared Browne, Florian Prem, Kevin Keller, Jeff Jockisch, Debbie Reynolds, Igor Barshteyn, Michael Bommarito, Axel Coustere, Mark Edmondson, Nicos Kekchidis, Pascal Hertzscholdt, Ken Liu, Devansh Devansh, Sairam Sundaresan

To badly paraphrase Shoshana Rosenberg, who was describing said elephant in a far more positive light.

Also, I’d hoped to have this post out before I left, and most of it was written, which is a nice change for me. Alas, like so many of my best intentions, scheduling, shiny objects, and various other life things got in the way. So it’s about a week late. Autant pour moi.

That text rarely comes as-is — and is itself cleansed and pre-processed to remove noise, clean up formatting issues, and toss the obviously bad/illegal/irrelevant info.

This is a painfully brief explanation, and both of these processes contain additional mechanisms to fine-tune models. You can read more here: https://www.turing.com/resources/finetuning-large-language-models

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? . In Conference on Fairness, Accountability, and Transparency (FAccT ’21), March 3–10, 2021, Virtual Event, Canada. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3442188.3445922.

I recognize that where the files logically exist in the file system may not be completely deterministic, but the algorithms themselves perform deterministically.

Article 2(1) GDPR.

I’m not saying that they couldn’t do these things lawfully, just that it would add a lot of extra overhead to do it in a data protection-compliant way.

I don’t want to say never because some asshole on the internet will ‘Well actually…’ me and I just don’t want to hear it.

Privacat

Jun 11, 2024

Glad you liked the piece, and clearly you're passionate on the subject.

I just read a contra view to your own -- Adam Unikowsky advocates for letting LLM/AI judges decide cases... He presents his case well, and it's worth a read.

https://adamunikowsky.substack.com/p/in-ai-we-trust

Expand full comment

direwolff

Jun 9, 2024

Always fun to read your astute and thought provoking posts. It feels like the hype (and the marketing that led to it), around GenAI is what’s gotten us to this point. While I doubt they footnoted results in their early days, the cautionary note they include now in “For this reason, you should not rely on the factual accuracy of output from our models.”, makes absolute sense in that nothing coming out of these models should be relied upon even when the answer looks or is right (since “even broken clocks are right twice a day” ;). The fact is (at least in the U.S.) that even wrong information is protected speech. Sure, defamatory speech can be a thornier issue, but it’s only when we decide to treat LLMs as some sort of oracle or tool to get true answers, that things run amok. As a toy or novel and interesting technology, or frankly as a parlor trick, they’re fun, but as soon as companies start building these into real systems that we depend on or need to place great trust in, is when we get into trouble. This is especially frustrating when we see applications like tax preparation software for example, provide these tools for people to get answers to real taxation issues only to find out later that the answers they depended on were wrong. Even with a disclaimer, there’s no reason the tool should be made available at all given the lack of reliability in such contexts. Solving for how to make these tools compliant with regulations, or even worse, how to properly regulate these going forward seems like a fool’s errand as it brings up tons of complexity for something that doesn’t work yet all in the continued hope that they be held as truth-telling systems. What we should regulate are the companies that embed these services in their applications and we should hold them responsible for any damages. The fact that they just used an LLM and it’s not their fault is hogwash, if they chose to be irresponsible about placing this tech in their applications despite it’s obvious shortcomings, that should be viewed as negligence. Software companies have long sought to abdicate responsibility for their software with as-is provisos or in the case of DAOs by calling themselves decentralized to remove the accountability requirements of a centralized organization. Enough is enough, the riches made by companies deploying these technologies at the expense of their users needs to be reconciled with the damage they are also perpetrating through a lack of responsibility. In many cases however, it’s less the pure software company I’d be looking to hold responsible and more the application developers and providers (frequently in other industries since software is eating the world ;), that let end-users interact with these buggy or just plain defective technologies. It’s them making the decision that these systems are “good enough” when in fact they are not. Sure, we will hear the cries that this will stunt innovation, but that’s just excuses to enable these nascent companies to generate revenues at the expense of the rest of us.

Privacat Insights

Can LLMs Unlearn? Part 1: Predictable Law, Unpredictable Machines

Part 1 in a multi-part series on the problems of deterministic laws in an increasingly non-deterministic world.

Can LLMs Unlearn? Part 1: Predictable Law, Unpredictable Machines

Can LLMs Unlearn? Part 2: Needle in a Neural Network - In Order to Forget You, First You Must Be Found

Can LLMs Unlearn? Part 2.5: Where OpenAI Sorta Maybe Deletes My Data

Can LLMs Unlearn? Part 3: The Technical Complexity of it All

Can LLMs Unlearn? Part 4: The Technical Complexity of it All (continued)

LLMs How do They Work?

The Complexities of Non-Determinism

Unforgetting Machines

Discussion about this post