What Does it Mean To Make Data Public?
Max Schrems filed a fascinating case, and leaves me puzzling over inference, again.
Those of you who have read this blog might recall that I see Max Schrems as largely as a chaos agent doing his damnedest to break the internet.1 Others consider him an indefatigable privacy advocate, and I’m sure the answer falls somewhere in the middle of those two extremes. I’ve written about him here, for example:
Schrems is a frequent complainant before the Data Protection Commission in Ireland, and a frequent litigant before various courts, including the Court of Justice (CJEU). Last week, Mr. Schrems was arguing before the CJEU again, in the case of Schrems (Communication de données au grand public), C-446/21. This latest case, like many before it, is a sprawl of a complaint that substantively touched on four legal issues, or in CJEU-speak, ‘questions referred’. The first two covered lawfulness of Facebook/Meta’s processing and data minimisation. Those are, IMHO, not very interesting and worth skipping over. Instead, I want to focus on his last two questions he posed, which I’ve taken the liberty of summarizing into plain English from the legalese provided:
Whether controllers like Facebook are permitted to target advertising based on sensitive/special categories of data like a person’s political opinions or sexual orientation; and
Whether a person makes sensitive personal data (such as their sexual orientation) public for all, including targeted advertising, merely by sharing it certain contexts.
At this point, it’s important to note that Schrems is gay, and reasonably open about this fact. He has discussed his sexuality on public panels, usually in the context of the importance of data protection and fundamental rights. Thus, his sexuality is not a secret, but it’s also not something that he broadcasts a great deal. I wasn’t able to find any press where he shared this information. And it’s definitely not something he ever posted on or shared with Facebook. And yet, Facebook somehow figured this out, and began showing him ads and recommending connections based not only his sexual orientation, but his political opinions as well.
Schrems did what he usually does, and brought a case against Facebook in Austria. The Austrian court rejected his argument, and dismissed the action, on the grounds that no infringement of his rights under Article 9 GDPR had occurred, because he had manifestly made details of his sexuality public by discussing that fact on these public panels.
Article 9 covers how controllers and processors handle ‘special categories of personal data’. Here we’re talking about things like a person’s race or ethnic origin, religious beliefs, political opinions, sex life, and importantly, sexual orientation. Article 9 generally takes the view that a controller like Facebook cannot process sensitive information about a person without having a specific legal basis for doing so. There are a few different legal bases — ‘explicit consent’ of the data subject is one; another is when information is “manifestly made public by the data subject.” Facebook argued, and the lower court agreed, that Facebook hadn’t done anything wrong when it presented ads and suggested groups based on his sexual orientation and political affiliations, because Schrems had manifestly made this information public.
Easy in Theory, Wickedly Hard in Practice
This is a fascinating case, and touches on a few really interesting themes and potential privacy harms. And while I don’t usually predict decisions (it tends to anger the Law Gods when you do), I do think I know where the court is likely to go on this one: they’re probably going to agree with Schrems, and not Facebook in this case. I’m pretty confident in fact, because we’ve been here before, in the Meta v. Bundeskartellant decision. In that case, Facebook/Meta made a similar argument related to so-called Off-Facebook data.
I wrote that:
the Court of Justice noted that Meta’s ability to link special categories data (based on Off-Facebook activity) to a user’s Facebook account, “may, in certain cases, reveal such information” about a user, without them directly providing it to Meta/Facebook (Para. 72). Therefore, this must be considered processing of special category data by Meta. Given that, there was no way to conclude that the user’s actions done outside of Facebook (e.g., by purchasing a product, or reading, liking, or sharing a page) meant that they intended to ‘explicitly and by a clear positive act’ to make that data public. Nor are they explicitly consenting to share that information with Meta (Para. 77-79).
Still, while the case might be predictable in principle, it might not be intuitively obvious to folks what such decision could mean in practice. Some questions to ponder:
What’s the threshold of disclosure that needs to be met before something is ‘manifestly made public’? Do search results count?
Does it require that the person in question be a public figure? Is there a different threshold for say, Schrems (who I’d argue, is in fact a public figure, at least in Europe) than me?
Is there a distribution or reach threshold that needs to be met? If I share that I’m a catgirl on my Substack, have I manifestly made that detail public? Have I made it public to Facebook?2 What if I author a piece on the benefits of being a catgirl in the New York Times?
Do we have different standards when it comes to ‘processing’ ? For example, is an inference made by a machine learning model that I can be bucketed into some category the same as a targeted ad? What if I never learn about the inference? What about Google’s new Privacy Sandbox initiative? That’s arguably making loads of inferences at a group level by bucketing users into topics and categories. This is better than cookies, but arguably might still create challenges based on the output of the CJEU decision.
Does the type of sensitive information matter? For example, is exposure of one’s race different than, say a person’s sex life?
Obviously, if I selectively tell friends and loved ones that I’m a catgirl, it would feel like a gross violation of my autonomy if Google started blasting ads for kitten kink sites and gear, or Facebook recommended me to catgirl appreciators. Exposure of that information could have some very serious consequences to me, including substantial chilling effects on my future choices and behaviors. As Professors Dan Solove and Danielle Citron note in their seminal law review article Privacy Harms:
Chilling effects have an impact on individual speakers and society at large as they reduce the range of viewpoints expressed and the nature of expression that is shared. … Consider the impact of news that the gay dating app Grindr had shared subscribers’ HIV status with analytics firms. Subscribers expressed profound dismay. Individuals told the press that they would no longer share that information on that app or any dating app—it was simply not worth the possibility that employers or others could find out their HIV status and hold it against them.
There’s no question that harms to our autonomy and choices cause us to hold back. For example, if I share nude photos with a significant other, and they leak those photos to a WhatsApp group chat, my autonomy has been violated. I may choose in the future never to share that information with anyone else out of fear of something similarly bad happening. Not only would that impact my rights to autonomy and choice, but it could also negatively impact my relationships with others. I might trust people less. It would be appalling then, if a court determined that such a violation nonetheless meant that I made my nudes or other confidential information public.
But, we also have to consider the very real question of how we enforce this sort of thing practically? It’s all well and good to declare that Max Schrems has not manifestly made his sexuality public. It’s an easy academic exercise to state that such an outcome would be a net good, and tech companies just shouldn’t leak that sort of thing in any way, but it’s another thing entirely to implement that in the bowels of Facebook’s targeted ads or OpenAI’s LLM. We are increasingly relying on AI, algorithms, and automation, even though we don’t have a goddamn clue about how to keep said machines from intuiting or inferring details about us.
In the Bundeskartellant article, I offered a number of inadvertent inferences that are already being made about us daily based on fairly benign stuff — Google Maps and location information can intuit your sexuality and recommend gay bars, your Substack subscriptions might lead to an inference about your religious or political affiliations, race or ethnic origin can be inferred from what you watch on Netflix. The Budeskartellant case points all of this out, but it throws a wrench in the machinery by also preventing Facebook/Meta from relying on other common legal bases for processing special or sensitive data (such as explicit consent or contract). The end result seems to be leading towards just don’t infer sensitive information, without any clear guidance about how.
The AI Act Adds More Complexity
The AI Act also doesn’t offer any help here. The Act is set to be finalized any day now, explicitly calls out inference dozens of times. For example, in Recital 6:
A key characteristic of AI systems is their capability to infer. This inference refers to the process of obtaining the outputs, such as predictions, content, recommendations, or decisions, which can influence physical and virtual environments and to a capability of AI systems to derive models and/or algorithms from inputs/data.
The AI Act bans inference about emotions and ‘intentions’ of an individual based on biometric and other special categories data, including what might be seen as benign details like hair color, tattoos, and interests which go well beyond what’s already recognized under the GDPR. The law is also riddled with some rather squishy exceptions.3
To reign in the worst abuses, the law sets a high bar on so-called “High-Risk AI Systems” (as defined under Title III of the Act and Annex III). In addition to outright bans in some cases, permissible High-Risk systems now have new obligations to meet. Most of these are pretty reasonable (providing details about the AI itself, including some insights as to how it arrives at decisions, what it’s designed for, details about the input data used, and the identity of the provider), but other obligations might prove especially tricky to implement. For example, the obligation to document things like the types of individuals or groups ‘likely to be affected by use of the AI system’, as well as “known or foreseeable circumstance[s], related to the use of the high-risk AI system in accordance with its intended purpose or under conditions of reasonably foreseeable misuse …”.4
This language, and the related obligations, strike me as vague and in some places, completely infeasible. As I think I’ve hammered home a few times now, we humans are shit at identifying our own inferential biases, or inferences made by relatively dumb algorithms and ‘smart’ devices. Applying that to generative AI and other forms of machine learning that are designed to infer things seems like it’ll be even more of a challenge. I really don’t know how all of this will work. And unless the CJEU has a firmer idea of how to unravel this mess, I suspect in the end, the Schrems decision is unlikely to offer much clarity either.
Note: I am probably going to move off of Substack in the near future. It shouldn’t have much of an impact to subscribers (I think I’ll be moving to Medium.com) but will also continue cross-posting on LinkedIn if you’re not a Medium.com fan.
I should note that I generally don’t agree with Max Schrems, or the organization he remains heavily involved with Noyb, on much of anything. In fact, I personally think Max is a bit of an egotistical asshole. So, when I came across his latest suit against Facebook Ireland, I read it assuming I’d have something new to rail about. Except, this time, he’s actually got a point.
FYI to Meta and anyone else: I am not, in fact a catgirl. Not that I’m yucking someone else’s yum, here, but that’s just not my kink.
See Recital (8a) of the AI Act: The notion of emotion recognition system for the purpose of in this regulation should be defined as an AI system for the purpose of identifying or inferring emotions or intentions of natural persons on the basis of their biometric data. … It does not include physical states, such as pain or fatigue. It refers for example to systems used in detecting the state of fatigue of professional pilots or drivers for the purpose of preventing accidents. It does also not include the mere detection of readily apparent expressions, gestures or movements, unless they are used for identifying or inferring emotions.
Article 13 (3)(b)(iii), AI Act.
Yes, I think we can agree broadly. From the recommendations perspective however, I probably view content and people recommendations as different than ad targeting, though they both use similar tech below the surface. The problem with the ad targeting version is that the advertiser gets a say in the matter. With content or user recommendations, this is a net benefit to all the parties w/o the need to get the content owner or the users input in the matter. I guess IMO the system gets corrupted when an economic actor can buy their way to get their interest met at the expense of the users’ interests in seeing the ads or in having the inferred information made available to others.
The “Do not sell my personal information” opt-out selection being made available to users is a good start, but it shouldn’t be opt-out to begin with, but rather opt-in. I know, I know, a girl can dream can’t she? 🤣
All this to say that I believe we’re in broad agreement and it was nice to read the finer details and legal references you included in this piece.
Thanks for this analysis. It really helps tease out various issues. As someone who has been on the Internet since the dawning of its commercialization, it has been interesting to watch it evolve.
The framing that you appear to be bringing to the piece is one that starts from targeted advertising is a legitimate business model and we just need to find whether there are practical ways of limiting what can be targeted. This will become harder over time given the new technologies evolving. I view this a bit differently. Advertising on the Web started out as untargeted. While some had a problem with it on the basis of it polluting pages, it was understood given that model existed in newspapers, radio and TV. It wasn’t long before targeting on the basis of content became more the norm, as a result of the fairly static nature of the web. This was generally the form targeting took on content sites. The assumption being that a user interested in a piece of content might appreciate related ads.
Behavioral, collaborative filtering or user-centric targeted ads is basically what we live with today, but it’s not what people signed up when joining online communities. If before they signed up there was a big banner that said, “By signing up you allow us to target ads and sell data about you on the basis of inferences we make about you or based on your social conversations or on anything you share. You are giving us permission to share this information with advertisers, data brokers, law enforcement and other gov’t institutions or anyone else we deem appropriately interested in this information.”. I suspect that it would force people to think about it more. The notion that once users are in these closed communities, targeting changes are made and additional rights are extracted from them as the tech evolves, is somewhat despicable. It’s hard for most to understand the occasional “Read the changes to the Privacy Policy” that sites make available given the thick legalese used. Some sites do make an effort to write a version of these in a more accessible method, but in the context of use, it’s too hard for the user to appreciate the impact of what they are implicitly agreeing to.
I think Schrems has called out the status quo. He has gone to the place no one else dared go to, which was to challenge regulators, legislators and the largest tech companies on behalf of the average person. It’s hard and dirty work, but as the inconsistencies between what users expect and what companies are doing become more pronounced, he seems to be the only true watchdog since regulating bodies had mostly dropped the ball (until he arrived on the scene).
Being online should not result in the abdication of our rights to privacy, and it didn’t used to. While we can appreciate that tech’s evolution has enabled new capabilities that no one has been prepared for in terms of targeting users or use of data about them, on the other side of that, the regulators are doing everything possible to attack the use of tools for users protecting themselves (ie. encrypted communications, VPNs, anonymous handles, etc.). So what are we to make of all of this? Should we sit back and let these companies with legislators in their pockets or under lazy regulators, continue to enjoy the privileges that come from the arbitrage between the value of untargeted ads and inference-laden targeted ads, as well as secondary and tertiary data uses, at our expense? Ultimately, there’s no good reason why ad targeting exists other than to increase the rents web sites can charge advertisers. There’s no user value (as much as the sites try to espouse such).
As I view Substack’s model, I see one where quality of participants willing to pay for content is far more interesting than getting a high quantity of users who don’t really appreciate the content. Where a content provider wants to offer a free access option, that too is great as they do so with purpose, to build a reputation and a base who may turn into paid subscribers at some point. None of this requires taking advantage of the readers by creepily surveying them and selling info on their comments or on the series of articles they have read. Or being forced to write misleading headlines.
OK, enough soapbox talk, your piece just stirred a lot of thoughts for me and wanted to share some additional perspectives on this. Thanks again.