What Happens When Everything Becomes Inferable?
We all make inferences. But must we get explicit consent every time we do?
This week, I came across the following article in Forbes, "A.I. can use VR headset data to predict users’ personal data even if they don’t directly reveal it, researchers warn", discussing two studies where researchers used AI to infer characteristics or traits about individuals wearing VR headsets. One of the studies (which may still be in pre-print and can be found here), out of the University of California, Berkeley, observed that researchers could effectively match participants’ separately-provided survey responses which included information on their personal background, demographics, behavioral & health information, to head and hand movement patterns observed during VR gameplay.
The researchers, relying on VR motion samples and machine learning models, were able to match head and hand movement data to individual participants’ survey responses, and through inference, accurately determine a person’s height, weight, foot size, and country with more than 80% accuracy using data from 1,0006 people playing the VR rhythm game Beat Saber. More troubling, was the fact that they could also infer other data attributes, including the player’s age (78.3%), marital status (73.3%), employment status (71.7%), ethnicity (70.0%), income (68.8%), and mental and physical disability status (63.0% / 62.0%), as well as nearly three dozen other characteristics. I’ve shared their data table below:
I’m not getting too up-in-arms about the study yet, as the details on the actual process of matching back are a bit vague (e.g., how many related elements did the researchers need to ‘match’ hand/head movements to survey answers? How much variance was present by survey participants? How many answered all survey questions?). The researchers also coded characteristics into binary traits (“old v. “young”, “married” v “not married/divorced”, “AnyMentalIllness” v “NoMentalIllness”), and so it’s hard to tell what they were able to meaningfully infer about an individual. If all you get is that Player 1 is male, young, wealthy and married, and Player 2 Had COVID, is female, likes music, and has a higher education, that isn’t terribly telling. And of course, it relied on the fact that participants revealed the underlying information in their survey answers. The researchers didn’t derive this information solely from hand/head movements — instead, they inferred it from lots of extra data provided directly by the participants. Unsurprisingly, Forbes’ headline and lede were more sensational than realistic.
So Why is this Interesting?
Despite my criticisms, the study is interesting because so much about us can be inferred through other data points (like metadata, telemetry data, body movements, opinions, etc.), and I predict that the rise of inference and derived data will only improve and become more prevalent, given the rise of AI and machine learning, advances in medical detection, virtual and augmented reality, and wearable tech. These technologies all help to identify and derive insights about us. But inferences can also come from the “opinions, reasoning, and assessments that underlie” final decisions.1 My concerns are both about the power of these inferences, but more critically, the fact that the law has seemingly swung from inferential indifference towards one that treats every inference made as intrinsically harmful.
For example, in July, I wrote about the Court of Justice’s recent decision against Meta, where it concluded that certain data, obtained outside of Facebook and matched back to a user’s Facebook profile may indirectly reveal sensitive, deeply personal information such as ethnicity, sexual orientation, religious affiliation, and gender identity, even where they didn’t provide that information directly to Facebook/Meta.
The Court’s response was to tell Meta that unless they could separate out the regular profile data and the inferred or derived data obtained elsewhere, the whole lot of it needed to be treated as sensitive (or in EU terms, “special category data”).2 And specifically that Meta needed to obtain explicit consent from users before they could process any of it. I pointed out that:
In general, the GDPR prohibits the processing of special categories and criminal data unless there’s an exception (Articles 9 & 10). Those exceptions are fairly narrow, and most are inapplicable to the types of businesses that engage in surveillance capitalism (or honestly, most capitalism). There’s no provision for contract or legitimate interests, for example. For data extractive companies, especially after this decision, there’s really only one option: explicit consent of the data subject. Explicit consent requires even more work than regular consent — notably an “express statement of consent” and companies are loathe to apply it because a person can revoke their consent at any time, for almost any reason (with a few country-specific exemptions).
And this rationale isn’t restricted to the EU. California’s Office of the Attorney General issued an opinion on inferences in March 2022, noting that businesses must disclose all internally-generated inferences based on information that the business collected about the consumer (directly or from other sources) in response to a data subject request.3
The rise of AI, ML, and other tech, means that it is becoming increasingly easy to infer sensitive, deeply personal, economically and legally consequential things about people. Intentionally (such as when it’s done to profile and score individuals), but also where it’s not. And while I agree that there should be regulation in this area (especially against the likes of Meta/Big Tech, data brokers, and invasive governments), I wonder if all the other organizations and entities also governed by these laws, will be able to keep up. Or worse still, whether, given the strict limitations on Article 9 and the use of special category data, all of this will devolve into a scenario where we as users are effectively are asked to consent to everything we do online or off. There’s a very real feeling that five years from now, we’ll all be facing “consent fatigue”.
A regulatory landscape that lacks nuance, and fails to distinguish reasonable, socially-beneficial, non-invasive, or even accidental inferences (like the Beat Saber example above), chills speech & innovation, or perversely encourages people to resign themselves to giving up their rights in order to simply do basic day-to-day tasks. Just look at how we deal with consent fatigue and cookies. The goal instead should be to distinguish the mere possibility of inferring something from cases where actual inference is occurring that meaningfully makes decisions which affect us.
By all means, line up Meta and Amazon, Palantir, Google, and basically every data broker imaginable against a wall and (metaphorically) shoot them. Put limits on their power and the toys they use to profile us. The tactics many of these firms use fall squarely into the types of high-risk profiling and decision-making abuses that lawmakers and society are keen to keep in check. But we all need more guidance on where the boundaries should be — and there very much should be distinctions between the unintentional and reasonable versus the intentional and malignant.
Sandra Wachter & Brent Mittelstadt, A Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data, 2019 COLUMBIA BUS. L. REV. 494, 509 (2019). The authors explored the idea of distinguishing between “reasonable inferences” versus “high risk inferences” that “damage privacy or reputation, or have low verifiability”. (pp. 495, 538).
Meta Platforms Inc, et al., v. Bundeskartellamt, C-C252/21, p. 89: “However, it should be specified that, in the event that a set of data comprising both sensitive and non-sensitive data is the subject of such operations and is in particular collected in bulk without the data being able to be dissociated from each other at the time of this collection, the processing of this set of data must be considered as being prohibited, within the meaning of Article 9, paragraph 1, of the GDPR since it includes at least one sensitive data … (emphasis added)
Opinion of Attorney General Rob Bonta, No. 20-303 (Mar. 10, 2022) (OAG Opinion).