On Ethics, the EDPB, Errors, and Endorsements
I share my thoughts on why we in the data protection and AI space should do better when it comes to content
Late in June, the European Data Protection Board published a press release on a recent crop of deliverables offered by the Support Pool of Experts (SPE). Here's Gwendal Le Grande's (now edited) announcement:
I was actually rather happy about this -- the EDPB is actually making greater use of the SPE pool and putting out some documents that might be relevant to the larger data protection community.* Rie Aleksandra Walle reminded us that of the 504 experts who made up the SPE pool, only 7 have received any work since the project was launched in February 2022. While this isn't great, this new batch of publications at least signaled progress.
Some of you who have been following me for a bit may recall that I have been critical of the SPE program, and in particular, its substantial transparency problem. After well over a year of not hearing or seeing anything despite being a member of the support pool, I made an FOI request in October 2023, and that appears to have led to an improvement in transparency and use of the expert pool since. This included publication of written expert materials, a SPE webinar, and an update on the SPE program generally. All coincidence, no doubt.
When the EDPB made the announcement for this latest tranche of publications in June, I was particularly interested in the body of work around AI Auditing. I've been working with the World Ethical Data Foundation for the past few months on reconciling the myriad advice on AI standards, obligations, requirements, and the kinds of questions AI developers and designers should consider when designing AI systems and platforms. I thought the checklist and in particular, the 'proposal for AI leaflets' documents would be instructive. I was pleased for example, when I reviewed the AI leaflets proposal to discover that there was a rather comprehensive list of references included at the end, many of which seemed directly relevant to this project. For example:
Binns, R., & Yampolskiy, R. V. (2018). Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits. ResearchGate. Retrieved from https://www.researchgate.net/publication/325551035 Problematic Machine Behavior A Systema tic Literature Review of Algorithm Audits [sic]
co-cddo. (n.d.). algorithmic-transparency-standard/template_table.md at main · co-cddo/algorithmictransparency-standard · GitHub. Retrieved from https://github.com/co-cddo/algorithmictransparency-standard/blob/main/template table.md
Lam, C. (n.d.). End-User Audits: A Design Space for Evaluating and Improving Algorithmic DecisionMaking. Stanford University. Retrieved from https://cs.stanford.edu/~clam/papers/LamEndUserAudits CSCW22.pdf
Pang, J., Liu, D., & Wang, D. (n.d.). Auditing Algorithms: Research Challenges and Opportunities. Stanford University. Retrieved from https://cs.stanford.edu/~jeffpang/papers/FnTAuditingAlgorithms.pdf
Excitement Leads to Sadness
As I clicked on the first 4 or so links in the bunch, I discovered some oddities. For one, many of the articles 404'd. Others seemed to point to completely different articles altogether. Some, like this
Real Decreto 39/1997, de 17 de enero, por el que se aprueba el Texto refundido de la Ley sobre Infracciones y Sanciones en el Orden Social. (1997). Juridicas.com. Retrieved from https://www.juridicas.com/normas/RD/RD 39 1997.pdf
seemed hopefully ancient and prima facie irrelevant. What would a Spanish law from 1997 have to do with AI Audits?!
Surely, some sort of mistake had occurred. Maybe the links had expired, or someone did a typo. In the interests of responsible disclosure, I pinged the author of the report directly and asked her what the deal was.** I had assumed it was some sort of harmless error or mistake with those links. Concurrently, I whacked together a spreadsheet and checked every one of the links provided, which led me to a very uncomfortable discovery: All but three of the links included in the references section were bogus, irrelevant, cited the wrong author or title, an incorrect publication year, or like the Spanish law, were entirely irrelevant to the subject. Some existed only in the references section of the AI leaflets article itself. I also noticed this little nugget at the top of the references section:
The references have been organized in APA style using ChatGPT
Uh oh.
Days passed, and the author never responded. I started to grow concerned, but really wanted to get to the bottom of this before I said anything publicly. So, I reported this discrepancy to the EDPB a few days after I informed the author, and they responded a few days later that they would 'handle the situation as soon as possible.'
Last Saturday, I finally received an update from the author, who said she would review it over the weekend. Yesterday, the indefatigable caller-outter-of-bullshit Peter Hense đŸ‡ºđŸ‡¦đŸ‡®đŸ‡±, posted on this, and I concurrently received an update from the author.
She admitted to using ChatGPT the month it was launched to normalize the bibliography (presumably mid-November 2022), but didn't bother to check that the sources were accurate, relevant, or even real. She thanked me for bringing it to her attention, apologized for the error, and took ownership for it, but noted that she was caught a bit off-guard herself. She wrote the report to the EDPB in the summer of 2022, and submitted it a few months later, where it sat dormant until June 2024. She hadn't done any reviews or updates to the work (or heard from the EDPB) since. That June 2024 update information was presumably a fabrication.
We All Screw Up. But We Can Do Better
Just to set things straight: My goal in writing this post isn't to punish the author. We all fuck up. I remember once leaving what was originally an unkind internal comment about someone in a Word document and then delivering that document to a client. It caused quite an incident.
I also use ChatGPT for things, as do many others, even if they won't admit it. This isn't a comment about the (de)merits of LLM use. The early versions of ChatGPT (circa November 2022) were known for hallucinating references and citations. And while it took a bit longer than I would have liked, the author has apologized and is now working to correct these errors.
What frustrates me is not that it happened, but that there were no mechanisms to check for this failure mode at all. Nobody -- not the author, or the EDPB, or even a lowly intern somewhere, bothered to read, much less sense-check this piece. There are other errors in this document (misquoting the ICO, and referring to the "AI ACT" as a draft law) that made it through to publication, but this certainly isn't the first EDPB publication I've seen with glaring typographical and factual errors. It probably won't be the last.
This is doubly insulting when we're talking about content that is directly promoted by one of the central regulatory authorities in this space. I get that the EDPB leads with this disclaimer on p2 of every SPE publication:
The views expressed in the deliverables are those of their authors and they do not necessarily reflect the official position of the EDPB. The EDPB does not guarantee the accuracy of the information included in the deliverables. Neither the EDPB nor any person acting on the EDPB’s behalf may be held responsible for any use that may be made of the information contained in the deliverables.
... but that doesn't change the fact that people (members of the public, and even professionals), are likely to interpret this publication as guidance from the EDPB. After all, the EDPB chose the author through their own opaque selection process. They paid her for her time and work product. They published and promoted it widely. It's hard not to assume that this document doesn't bear at least some sliver of tacit support from the regulatory body who commissioned it.
I'm also angry because if I hadn't discovered it and checked the links and notified the author/EDPB, and importantly, if Peter hadn't blown the whistle publicly, I'm not so sure this would have seen the light of day, much less be corrected. This is the type of sloppiness and lack of oversight has led to a publication crisis throughout the hard and social sciences -- It's why the famed Retraction Watch has an ever-growing collection of papers and peer-reviewed articles written in whole or in part by ChatGPT.
Still, it feels worse because we're talking about publications that are explicitly focused on building transparency, setting standards, and providing guidance on the appropriate uses of AI. Using AI to turn out AI botshit, or factually-inaccurate citations feels worse because the content is about how we should adopt ethics and transparency around the use of AI. Now, I don't expect infallibility from anyone, including the regulatory bodies, but is it so much to ask to have someone at least make sure that the citations are real?
Sadly, this article isn't the only one where I realized that 'ChatGPT-assisted' content was being turned out by members of our industry who really should know better. It's not even the first article I discovered last week. I recently read an entire book on AI and data protection, written by an esteemed practitioner in this space, which bore the tell-tale signs of being at least partially written by a LLM.
Do we as an industry really need our own version of Retraction Watch?
While I don't think this is part of some grand intentional conspiracy, it still sucks that it happened. Yes, we all fuck up. Errors happen, and this isn't Pokemon -- we can't catch them all. But we should learn from this and endeavor to improve going forward. Otherwise, all this AI compliance, all this regulatory oversight, and everything else feels like a bunch of pointless theatre, as far as I'm concerned.
Update: The citations, according to the EDPB, have been corrected. All but one appear to actually correspond to the correct results. That weird Spanish law from 1997 has been removed.
* Putting aside that the DPO training document is only available in Croatian.
** I am intentionally not including her name here because she doesn't deserve to get dogpiled. The point of this piece isn't to drive a witch-hunt, but to drive conversation and build improvements on the systems we engage with. Please don't make me regret posting this.