4 Comments

Really great thought provoking piece here…again 😃. So this got me thinking about libel law and defamation. First off, I’m not a lawyer and do not even play one on TV, not to mention that I’m an idiot, so lacking the mental capacity to be one even if I wanted to be. With that out of the way…

“To prove prima facie defamation, a plaintiff must show four things: 1) a false statement purporting to be fact; 2) publication or communication of that statement to a third person; 3) fault amounting to at least negligence; and 4) damages, or some harm caused to the reputation of the person or entity who is the subject of the statement.”

If a statement was made by Gemini or ChatGPT about your friend Heidi, but unlike her examples, it went a bit further to say that she’s renowned to be wholly untrustworthy and unreliable as an attorney because of several blown cases, and then it turns out that this was all a hallucination, totally fabricated, could Google or OpenAI be held liable given their clear negligence in not having taken measures (and even if they did but these still failed) to prevent users of their services find these outright false statements that their bots were claiming to be true? When these LLMs are under the covers of other applications, disclaimers on their results quality that may exist are not usually evident. To the extent Heidi could show that she lost business after people consulted with the bots to determine if she was a suitable attorney for their needs, could she go after these companies with a credible case? Let's take it a bit further and say Heidi tested this herself and found this error, reached out to Google or OpenAI to let them know of the error. These companies claim that they don't know how this happens and that not practicable to fix. Then the initial case happens where she actually loses one or two clients to these false statements, can she claim defamation then since now the companies have been made aware of the issue but have done nothing to remedy the offense? 🤔

Expand full comment

I think that's a slightly more convincing scenario. The issue comes down to whether or not an ordinary person believes it's a statement of fact (this is for examples of private individuals; the standard for public figures is higher and requires 'actual malice - knowing falsity) as opposed to an opinion, hyperbole, or other protected speech. I think ChatGPT spewing out a falsehood about Heidi's status as an attorney (e.g., several blown cases, imputing she's a bad lawyer), could materially affect her reputation, and could, assuming she meets her other burdens, including damages, be seen as a libelous statement. It's even easier if ChatGPT were to accuse Heidi of a crime/moral turpitude.

Expand full comment

I agree with your argument here, that hallucinated data can still be personal data. I am wondering though if there is a practical problem with this. What measures can be put in place to rectify the inaccuracies in the hallucinated data? I think this might be a bit tricky. My understanding is that the hallucination (or the inaccuracy) is not necessarily a function of inaccurate training data. It is a function of the model. It generates a probability distribution over its training data and uses this to predict what the best response to the prompt should be. So even if all the training data is 100% factually accurate (though not sure if this is possible) the model could still produce inaccurate outputs because the nature of that output is probabilistic and not deterministic (and the model is so big and complex that it is hard to anticipate its behaviour sometimes). That is not to say that the right to rectification does not exist, for if the hallucinated data is personal data then data subjects should be able to exercise their rights. It is more of a question of practical fulfilment of the right to rectification. Curious what you think of this.

Expand full comment

You're totally right, and I agree that this makes the answer to 'how do we solve this' quite hard! Deletion seems like at least one option, but that's contingent on the data subject becoming aware of the hallucinated / false data in the first place, which itself may be a hard problem.

For instance, I'm trying to get a full access request fulfilled by OpenAI and it's proving a challenge because they don't seem to get that hallucinated data about me is still personal data about me.

It's possible that at a technical level, it might be more along the lines of what they are doing now to address bias/misinformation/disinformation -- strict higher-level prompting and overrides by OpenAI along the lines of 'When a question is about a person, temperature should be 0.2 and sources must be provided for any claims made" -- which would default to a very rigid, strict and limited result, and control against hallucinations more effectively.

Expand full comment