Privacy Disasters: Microsoft, Just Because You Can
... Doesn't mean you should. Here's why.
Update: Kevin Beaumont on his Double Pulsar blog, added some very useful additional context on how Recall works at a technical level, and the information security implications of Microsoft’s approach. I discussed many of the problems he identified (non-optionality, exploitability by adversaries like hackers/governments, the fact that there’s no filtering of … anything). What I did not know was that in addition to recording everything you do on your machine as an OCR’d screenshot, it is also writing that text into an easily searchable (and grabbable) SQLite database in the user’s folder.
Guys. I cannot begin to express how absolutely insane and bad that is. I have added Kevin’s observations below as DPIA risks in bold. He also has some very helpful suggestions for how to disable this abject nightmare, so his post is worth reading.
This week, Microsoft graced the world with yet another tech idea that comes straight out of a Black Mirror episode: an always-on, always-recording life-logging tool that takes screenshots of everything you do on your computer. But now with AI to find things!1
Here’s what Microsoft had to say:
Search across time to find the content you need. … With Recall, you have an explorable timeline of your PC’s past. Just describe how you remember it and Recall will retrieve the moment you saw it. Any photo, link, or message can be a fresh point to continue from. As you use your PC, Recall takes snapshots of your screen. Snapshots are taken every five seconds while content on the screen is different from the previous snapshot. Your snapshots are then locally stored and locally analyzed on your PC. Recall’s analysis allows you to search for content, including both images and text, using natural language. Trying to remember the name of the Korean restaurant your friend Alice mentioned? Just ask Recall and it retrieves both text and visual matches for your search, automatically sorted by how closely the results match your search. Recall can even take you back to the exact location of the item you saw.
This is one in a very long list of ideas that indicates two things:
Some things are better left as half-baked ‘philosophical’ ideas cooked up in dorm rooms when you’re high with your friends;
Tech project teams clearly like reading/watching dystopian sci-fi, but continue to ignore the ‘cautionary tale’ aspects of said genre entirely.
Recall (they really should have just called it Torment Nexus 1), is on by default. Any indicia of it being around is only present if you activate it with a Windows key, or were involved in the initial setup. It records everything that you do on your PC, unless you explicitly either shut the thing off, or manually exclude certain apps or websites.
Thankfully, it does filter private browsing activity and websites, but only if you’re using Edge (or to a lesser extent, a Chromium-based browser — get fucked, Mozilla users!) Users must manually add websites in Recall settings, because nothing exemplifies ‘user choice’ and control like adding to cognitive load in the UX:
But fear not: Even if you managed to disable things and block website monitoring, Microsoft still records things if you accidentally button-smash the ‘launch Recall’ feature. As a Windows user, I have accidentally button-smashed so many things that this strikes me as less a possibility, and more an eventuality.
Snapshots are stored forever, with the text recorded into a plaintext SQLite database in the user’s AppData/CoreAIPlatform folder.2 At least until you run out of disk space, and then it follows a first in, first out process. Data is also encrypted at rest using Bitlocker or Device Encryption, which is better than nothing. You do not need to run as admin to see this file. I cannot even begin to explain again, how bad this is.
There’s no indicator (as far as I can tell) showing users whether Recall has been activated. There’s also no logic built in to prevent storage of things like financial or credential information in a snapshot, sensitive materials like your nudes or private conversations. Though, of course, it being a Microsoft product, there are controls against storing DRMed materials.
So, needless to say, if you sit down to use someone else’s computer to say, check your bank account or whatever, congrats, they now have access to everything. They won’t be able to see that Kindle book you’re reading though. As Kevin notes on his blog, the actual database “compresses well, several days working is around ~90kb. You can exfiltrate several months of documents and key presses in the space of a few seconds with an average broadband connection.”
About the only good thing about this version of the Torment Nexus, compared to a previous version of this that was released with far less fanfare in Windows 10 (h/t to themudman.bsky.social for pointing that out!), is that everything is stored & processed locally instead of being sent to the cloud. At least for now.
Unsurprisingly, regulators immediately noticed, and the ICO, which is the UK’s data protection regulator, announced days after launch that they had initiated an investigation. I’m sure Ireland as Microsoft’s lead regulator, as well as Germany, France, and other countries data protection regulators will soon follow. Update: According to Cianan Brennan over at the Irish Examiner, the DPC is allegedly looking into it.
Alright, so that’s the background. Let’s get to the fun part (for me at least) of the Privacy Disasters series: pointing out the risks that Microsoft should have identified if they’d done any sort of data protection impact assessment (DPIA) before launching this eldritch horror.
The DPIA that Microsoft Should Have Done
First, a little legal reminder of why this needs to happen, as this wasn’t apparently clear to anyone at Microsoft.3 Under Article 35 of the General Data Protection Regulation (GDPR), a controller4 must perform a data protection impact assessment when certain “high risk” processing activities occur. The Irish DPC helpfully provides a list of high-risk processing activities. For purposes of the Recall product, Microsoft probably should have done a DPIA given that the processing its doing is high-risk for one or more of the following reasons:
Using innovative new technological or organisational solutions (like AI and creepy always-on life-logging tools);
Processing used for the purpose of systematically monitoring, tracking or observing individuals’ location or behaviour (like what websites they visit, what apps they use, and who they communicate with);
Processing that concerns vulnerable data subjects (for example, children, folks with limited or impaired capacity, people in abusive situations);
Undertaking large-scale processing of personal data (for example, law enforcement would love this information to go after baddies, employers to track employees, and domestic abusers to stalk their victims).
The Recall tool also arguably processes personal data on a large-scale for a purpose(s) other than that for which it was initially collected, especially if the tool is on by default and the privacy notice fails to identify every processing activity that Recall can be used for. The large-scale in this case is looking at Recall in aggregate (it’s going to be on by default on all CoPilot+ laptops apparently unless users disable it).5
Now that I’ve got the why DPIA out of the way, let’s look at some of the questions and risks that I’d probing the product team & engineers about. That is, after I got all the swearing out of the way:
Transparency: How are users being informed of the processing that’s happening here? Is it buried in a 20-page privacy notice that nobody is going to read? What about users on shared machines like family computers, or shared terminals at a public library? Right now it looks very much ‘one and done’, which means Recall is likely to catch a whole lot of users off-guard.
Consent of users/third parties: Currently, users are only notified when a fresh install of Windows 11 occurs. If someone else installs Windows for you and selects the default options, have you, as the actual user provided informed, meaningful consent? If I’m in an abusive relationship and my abuser cheerfully acts as my tech support and turns this on by default to monitor my search and website activity, what then?
What about third parties? If I use Recall, anyone I interact with online (on chat, or in a video) has no clue they’re being recorded. While screenshots and video logging exist today, the frequency and likelihood of any specific interaction being recorded and accessed again is tiny. Most people aren’t taking screenshots of every single change at 5-second intervals. That frequency gets much higher when we’re talking about every Windows 11 system. As of 2019, Windows is installed on over 800 million devices, and currently (as of April 2024) represents 30% of the entire OS market share in the world, with Windows 11 making up 26% of that. That’s an awful lot of devices potentially recording everything.
Proportionality: This leads in nicely to my next concern. Recall is monitoring almost everything a user does on their PC. Every chat. Every website click. Every draft email.
Our collective expectations of privacy are that most people we chat with aren’t invasively recording our every comment, utterance, or intimate moment. We assume that our early, roughly drafted emails disappear into the ether after we send the final, polished piece. We assume a modicum of control over what we share with others. If we share something or post something, we usually can delete it later. Or we can set a timer to delete messages after a certain period of time (like on Signal or SnapChat). But these assumptions no longer hold with an always-recording and instantly searchable life-logging tool like Recall.
Also, deleting the physical contents being ‘recalled’ does not delete the stored details in the database. That sticks around indefinitely.Children: What about kids? Most data protection laws, including the GDPR, have heightened standards for consent in relation to children. And while many kids have learned the value of deleting web browser activity, they may not know that mom & dad turned this thing on and are monitoring everything they do on the family computer, like whether they looked for information on sexual health, bullying, gender identity, spoke to a friend about family abuse, recorded a suggestive selfie, or did other things they probably don’t want the parental units to know about.
Windows is also popular with schools, and I don’t even want to begin to imagine how Recall would be used in that situation.Expanding Purposes: Microsoft is effectively installing a keystroke-logging / spyware tool on everyone’s CoPilot+ enabled machine. While Recall currently processes and stores things on-device (including AI processing, yay), what assurances do users have that it will stay that way? What if Microsoft strikes a really neat deal with OpenAI or an advertiser, that shifts processing or storage to the cloud in order to ‘improve user experience’ or ‘provide more targeted ads’?
What if Microsoft is compelled by the US or an adversarial government to monitor criminal behavior, report ‘grooming’ behavior, or target members of a disfavored group, like journalists?
Surveillance & Retaliation: Recall isn’t just for end-users — Microsoft envisions this tool being widely used by businesses as well. For the enterprise, policies are set at the group level, which means that individual employees won’t be able to opt out and disable always-on workplace monitoring, though they may be able to delete snapshots, and filter websites & apps. It’s honestly a bit unclear. Once again, this creates a ‘consent’ issue. The regulators are pretty clear that the employment relationship is lopsided and imbalanced making reliance on consent a non-starter in most employment contexts.
Remember that draft email I mentioned in the proportionality section? Now imagine that it’s an early draft of an email you wrote to your boss because he ticked you off that day. How many of us have written an initial draft email in anger or frustration, only to sit on it for an evening, pet a cat, cool down and draft something more measured and reasonable in the morning? I doubt it’s just me, although I do have rage issues.
But now, if you’re at work and Recall is on, your boss will also be able to find out that your seemingly-measured response started out as an angry, profanity-laden rant where you insulted your bosses’ mom, and told him he sucked as a human being. Your initial frustration (which was never sent, mind you!) might cost you your job.Exfiltration & Data Breaches: Kevin went into great detail about the threats from exfiltration of data, and how easy it is. “I have automated exfiltration, and made a website where you can upload a database and instantly search it.” While he hasn’t made this live, what this means is that if he figured it out, a whole load of people will be able to easily do the same once this hits go-live.
There is also an API for searching user activity, and third party apps can plug in to ‘enrich’ or view stored data.
As Kevin observed we’re going to see an explosion in data breaches — both in frequency (because it’s on-by-default and most people won’t know), and severity.
”… if people have used a Windows device with Recall to access the service/app/whatever, hackers can see everything and assemble data dumps without the company who runs the service even being aware. The data is already consistently structured in the Recall database for attackers.
So prepare for AI powered super breaches. Currently credential marketplaces exist where you can buy stolen passwords — soon, you will be able to buy stolen customer data from insurance companies etc as the entire code to do this has been preinstalled and enabled on Windows by Microsoft.”Compliance Hell: That leads me to a related concern — all of this newly collected, stored, and searchable data means a whole lot more compliance hell. If businesses think access, rectification and deletion requests suck ass now, multiply that by every single snapshot, on every employee’s device, for every single change ever made. And let’s not even get started on litigation holds!
Controllership: This, IMHO is huge, and is likely to be overlooked until it creeps up in a lawsuit somewhere or gets called out by a regulator. Right now, it appears that Microsoft assumes that there’s no real GDPR or data protection issues at play because Recall is under the ‘control’ of end users/individuals, and processing activities done by individuals are generally out of scope for GDPR purposes since they are considered a ‘purely personal or household activity’.6
Ignoring the business use case I mentioned above, I think there’s an interesting question of whether the use of Recall by an individual user is in fact always a ‘purely personal or household activity.’ The GDPR explains (in Recital 18) that the ‘household exception’ applies when an individual is processing (aka, doing stuff with) personal data for personal reasons, with no ‘connection to a professional or commercial activity.’
The intent of the exception (which was drafted in the early days of the internet) was pretty simple: the EU didn’t want to make every type of private communication or use of personal data a giant assache. The Commission reasoned that nobody wanted to require Joe Random User to post a transparency notice or set up a data processing agreement when he emailed grandma, or posted his group selfie on Twitter. And no regulators wanted to enforce against Joe Random, because that would suck.But the household exemption isn’t absolute, and over the years, there have been a few cases clarifying when it applies (correspondence, and communications with friends) and when it doesn’t (CCTV recording of public spaces outside of a home, publication of personal details of others on a blog). It turns out, that the ‘purely’ bit in ‘purely personal or household activity’ is really important, actually. It’s not enough, in other words, that processing is done for a personal reason — it has to be only for a personal or household use. When you start adding others to the mix and sharing that information, things get messy.
One aspect that I think damns Microsoft here (and potentially many individuals who use Recall) is that Microsoft envisions that app developers may create and interface with Recall to provide enhanced user experiences — for example, the ability to jump back into a task that has been ‘recalled’. I’m not sure that an interface that allows say, some random Microsoft app to pull up your Recall logs qualifies as a purely household activity anymore, especially if that application is doing different things with the data (like harvesting it for content/AI training).
Nor is the deeply troubling use-case that Evacide reminded me of: what happens when a domestic abuser uses Recall to stalk their partner? Did the European Commission really intend to legitimize or at least exempt stalking as a household activity?
And if the household exception doesn’t hold, does that make us all controllers? Imagine trying to help your neighbor with an access request. Or going through mom’s Recall snapshots to delete every time she said something nasty about Timmy down the street. Or filing a data breach notification when someone manages to break into your machine.
There’s more I could include here, but I’m at the email limit. Suffice to say, Recall, as it exists now, represents a privacy disaster. Maybe it could be improved. Certainly Microsoft could do better. As it stands though, I suspect Microsoft will face lots of regulatory ire, and it’s appropriate. They should have talked to me first.
And for more historical nerds, Vannevar Bush’s 1945 essay, As We May Think.
Kevin linked to a video where one of the Microsoft engineers demonstrated how to access that file. Which is stored in unencrypted plaintext on your machine if you’re logged in. https://cyberplace.social/system/media_attachments/files/112/535/509/719/447/038/original/7352074f678f6dec.mp4
Or it was, and the advice was simply ignored by the product team and senior leadership, to which I say, listen to your fucking DP team next time.
Controllership will be tricky here because of the local storage aspects which I discuss below, but for purposes of the larger DPIA exercise, I’m treating Microsoft as a controller. They’re at least deciding the ‘means and purposes’ of the tool, including the fact that it’s on by default.
I’m less confident in this compared to the others, as everything will be stored and processed locally. Provided Microsoft isn’t lying and shipping everything to the cloud (or shipping the learning outputs to the cloud) it may not meet the large-scale threshold at an individual use level. Large scale monitoring would apply if organizations turn this on to say, monitor their employees though.
Under Article 2 GDPR, the data protection laws do not apply to a person processing data for personal or ‘purely household’ activities that have no connection to a ‘professional or commercial activity.’ Examples of personal/household activities include correspondence, posting on a social media site, sending an email to your friend, or say, recording videos of your cat. However, the GDPR still applies to controllers or processors who provide the service to end users.
For example, I am not a controller when I use Twitter, but Twitter remains a controller/ processor governed by the GDPR.