Further Thoughts on Google Bard Extensions, Privacy Harms & Unanswered Questions
Where I try to piece together what Google's inner corporate thoughts might be.
// I dumped a few thoughts on LinkedIn earlier this week, but also wanted to cross-post and supplement here.
So Google is trying to push its ChatGPT competitor Bard hard with the one big competitive advantage it has over OpenAI — access to all our sweet, sweet personal data, including our emails and Drive-stored documents. This Verge piece discusses some of the promise, but it left me with many questions and concerns about consent, copyright, trade secret, privacy, and autonomy perspective.
To their credit, Google has done a decent job explaining how enabling Extensions in Bard will work, the benefits of using Bard with Extensions enabled, how users can enable and disable Extensions access, how they can manually delete data, and what details Google and human reviewers will have access to. But, as is the case for most BigTech, the company does a shit job at making these details easily findable. Most of the relevant information a user might need is scattered across multiple pages which is annoying. But at a high level, Google promises that it won’t:
Access Workspace data (i.e., Drive, Gmail, Docs, Sheets) data, or make it available to human reviewers;
Use results to improve its Bard or PaLM2 machine learning models;
Use the results to target ads;
Store results past the time period needed to provide and maintain Bard services.
That said, I was still left with a lot of questions.
Choice. Google is pretty clear about how users of the service can control Extension access to their Workspace data. What’s less clear (okay, nonexistent) is whether/if non-users of Bard have any choice in the matter. We don’t generally send emails to ourselves, after all. And in the context of work documents, there’s legitimate concerns around not only personal data, but also company sensitive/proprietary data.
There’s absolutely nothing I was able to find that discusses how Google deals with results that include the personal data of third parties, or how individuals can object, get access to, delete or exercise other data subject rights.
Lawful Basis. There’s also no information on what lawful basis Google relies on to process non-user data (though a careful reading of the Privacy Policy suggests it might be legitimate interests). To me, this is the biggest blind spot in the whole rollout of Bard Extensions, and represents a stupid, but predictable own goal when it comes to compliance with Article 14 obligations under the GDPR. As a few of my Google friends regularly lament, Google tends to forget that its customers are human beings. Whether it’s hiding information intentionally (out of some perceived fear that people might *gasp* figure out what’s up), or just failing to get around to it, I don’t know. I suspect most of the time, it’s the latter, but it still makes me sad, because I know so many smart, talented Google privacy peeps.
Fractal complexity again rears its ugly little head.
Retention. In typical corporate fashion, there's also nothing substantive on Google’s retention policies, what information Bard keeps, or for how long. For example, if I ask Bard to summarize all emails between my husband and I, Google explains that unless I manually delete the query, Bard will store so-called ‘shared information’ (such as information from the conversation, preferences, and location information) 'until no longer necessary' which is lawyer speak for 'as long as we damn well feel like and can get our lawyers to justify.'
And for astute readers, if you click on this link, you'll see kind of a retention policy for other things, but nothing explicitly related to Bard Extensions and the retention of that data. That said, the answer might be 63 days, at least according to this Quora post which discusses Google’s Wipeout (deletion) process generally.
Bard Activity. That said, users can control whether or not they allow Google to retain data either by deleting specific results or disabling Bard Activity, which sets a 72-hour retention window (for most data). However, in order to use Extensions, you must enable Bard Activity to be recorded. Why this is the case is not explained in their documentation. An ‘incognito mode’ for Bard Workspace searches would be pretty cool, actually. Maybe someone on the #Bard team can explain?
Impacts on Others. Beyond retention and other uses, I worry that this is going to be a data subject rights, and in particular, an automated decision-making nightmare. For example, let's take the case of John Smith applying for a job.
If John Smith applies for a job at ABC Corp, and the recruiter uses Bard Extensions to 'summarise' John's resume and cover letter to decide if he deserves a call, what happens if Bard confuses him with another John Smith? Or confabulates candidate data or mixes it with someone else’s information? Recruiters are already a bit in the shit given the widespread use of automated applicant tracking software (ATS), and the EEOC has started to crack down, so this isn't just privacy doomcasting.
If he gets rejected because Bard got it wrong, poor John Smith will never know. A recruiter is unlikely to check, and that could lead to legal or ‘similarly significant’ effects to him. This spells trouble, both for the recruiting company (the controller) and Google (who is arguably a joint controller). Now pretend it's patient data and we're talking about doctors correspondence (or don't, it probably will make your head hurt).The confabulation question isn’t even hypothetical. Bard makes shit up all the time, and badly. Take for example, this recent query I made, where I asked Bard to provide more details about its retention policy, based on a review of Google’s own public-facing documentation. Ignore all my tabs at the top:
Joint Controllership. If I'm the data subject and do find out my data is being Bard-ified, who do I complain to? How do I exercise my rights to deletion, objection to processing, rectification, or access. Google? The nominal controller? Sure, this question might be easy in the recruiting/work context, but there's a reasonable question about whether the use of Bard by an individual user might exceed the scope of the 'household exemption' and how/whether/if they would need to respond to a data subject request.
I'm worried that in Google's rush to compete with OpenAI and Meta, they may have overlooked or not adequately considered these questions and others. They’re better than most, and I have the benefit of knowing that some very awesome people work at Google who are committed to user privacy and protection. I’m sure internally engineers and the privacy teams involved thought critically about this stuff.
But externally? Google’s practice of failing to treat customers, users, and other people like actual, living breathing human beings makes for a fertile breeding ground for the people who hate Big Tech and want to crusade against technological encroachment generally. As I said, an own goal that Google could easily avoid with just a bit more transparency.