Seeing Through Apple's Vision Pro: Privacy in the Age of Mixed Reality
Digging into mixed reality's privacy implications and Apple's vision on design and privacy.
In early June, Apple announced the release of their Apple Vision Pro at their annual Worldwide Developer Conference. Given the fawning praise from the press, you would have thought they announced a cure for cancer. Still, where others saw a ton of promise, I immediately wondered — what does this mean for privacy?
There are a few features that set the Vision Pro apart from existing VR tech like Meta's Quest or the Magic Leap headsets. For one, it's an augmented or mixed reality headset -- or as the marketing team is calling it, a "spatial computing" system -- designed to augment and enhance your world, rather than replace it with the Metaverse. That means, Apple designed this not to be something you wear just in the confines of your own home -- but also when you’re out doing things in the world and interacting with people.
Navigation is controller-free -- you "navigate simply by using your eyes, hands, and voice" according to their website. In fact, the eye-tracking feature is one of Apple's strongest selling points -- it is apparently so good that Marques Brownlee likened it to magic and declared it “almost telepathic”. Given that Apple allegedly has over 5,000 patents related to the Vision Pro, it’s hardly surprising that the tech is really good. I’m not an Apple adherent, but I can appreciate good tech just the same.
The Vision Pro May Change Everything … But not Necessarily in a Good Way
Firstly, I’m skipping the Optic ID iris scanning technology in this issue, mostly because I recognize that I’m not well-versed enough to speak on how Optic ID will work, whether Apple has sufficiently balanced the use of sensitive biometric data adequately, or how they will be securing iris and eye-tracking data generally. Rather than speculate, I’m going to hold off, but I expect that Optic ID, and its use as a password replacement and method for making purchases will come up as more details emerge.
Instead, I want to touch on a few of the other choices and features Apple has unveiled as part of its VisionOS and the Vision Pro generally, and some of their potential privacy impacts.
Personas — A Digital Representation of Your Face
One key feature that sets the Vision Pro apart from say, Meta’s Quest 3, is that it scans your face when you setup the device, in order to create a 'Digital Persona' of you to display to others when they pass by your field of vision or when you're chatting on FaceTime. Obviously, doing a video call with only a pair of eyes staring at you constantly would be disconcerting and creepy, so a digital avatar that mirrors your eye, mouth, and facial movements seems like a clever workaround. Some have described it as a bit ‘uncanny valley’ though.
One does wonder — where is all the Persona data kept? Is all processing and storage done on-device when you’re FaceTiming, or is this being kept and potentially usable by Apple somewhere else? What about app developers? Will they have access to your Persona?
A Record of Our Lives, All the Time?
The Vision Pro is capable of recording information with its twelve cameras, five sensors (including a LiDAR sensor), and six microphones. While there is a visual indicator of sorts to alert others when photos and videos are being taken, it’s unclear whether the Vision Pro is recording and storing or processing information constantly as users interact with the world. Apple’s documentation is pretty thin on details here, but Mingwan Bae, an industrial designer based in Korea, observed that during “mixed-reality” experiences, “Vision Pro almost always has pass-through enabled, which means it’s always recording and rendering with two high-definition cameras, which may be one of the reasons for Vision Pro’s short battery life” (emphasis added). But even during fully VR-immersive experiences, pass-through recording likely doesn’t turn off. Regarding awareness of recording, there is a visual indicator to alert people around the user when photos and videos are being captured with the Vision Pro.
Now, you might point out that phones also record photos, video, and audio (often without notice), and that most smart devices are also recording audio and/or video constantly. To some extent, we as a society have become accustomed to people recording interactions and events, but in most cases, it’s both obvious, and intentional. I’m not sure that will be the case for the Vision Pro. We might be sleepwalking into a very different world — one where headsets are recording our every interaction, online and off. And with the integration of AI models (Siri will be included by default, and I’m sure other companies will be working on incorporating even more advanced AI), we may, within a few short years, start to see true ‘virtual assistants’ — familiar with our likes, dislikes, our aversions, perversions & affinities, and capable of manipulating our environment to show us more of what it thinks we want to see, and less of what we don’t.
I’m reminded of the “White Christmas” episode of Black Mirror (which is so very good, and you should watch it), where everyone has an advanced neural implant known as a Z-Eye. The Z-Eye, like the Vision Pro, had many handy features, including the ability to replace your phone in handling calls, messaging, taking photos & recording events, but it also included a few dystopian add-ons. One feature of the Z-Eye was the ability to stream everything the wearer sees to a computer, so it can be viewed and influenced by others (including an enslaved virtual assistant of sorts), and another is the ability to physically “block” a person from the wearer’s visual field entirely.
We already have the ability to manipulate our environment — blurring images, cutting and editing video, creating deepfakes, and of course, spreading disinformation of the written word. It seems entirely conceivable that such features might also be extended to real-time events. This digital manipulation will have profound impacts on our lives, our choices, and our interactions. A world of live-blurring seemed far-fetched in 2014 (when that episode aired), but it seems a lot more probable now.
I Don’t Consent — But Do I Even Have a Choice?
Based on the cameras, sensors, and all that processing, the Vision Pro has the potential to create detailed maps of environments and record people without their consent, yet there’s little discussion about measures in place to protect these privacy interests. Yes, the Vision Pro will alert users if recording is occurring, but what then? Short of ripping the rig off the wearer’s face, what recourse do non-consenting people have? Will we go back to the early days of Google Glass where signs are posted and we hope for the best, or have we all since become desensitized now that CCTV, facial recognition, and cellphone cameras are ubiquitous.
And how will this processing be regulated anyway? Currently, EU data protection authorities are pretty responsive to complaints about overzealous CCTV use by controllers, but does the use of a Vision Pro even create a controllership situation, or is it instead under the aegis of a “purely personal or household activity,” generally exempt from the rules of the GDPR?
Where Will the Data Live and Who Can See It?
I wasn’t able to find a single piece of information on storage and some of the more complicated mixed reality processing features. It’s realistic to assume that some Vision Pro activities — including taking pictures, using the phone, browsing and other features you’d already see on a mobile device — as well as sensor and eye-tracking data, will be processed in the Secure Enclave and remain entirely on-device. Apple has said as much in a document allegedly released to developers. But what about the complicated stuff? How much processing will happen in the cloud? Where will that FaceTime digital Persona rendering occur? What information will be shared with Apple (or third party app developers)? Where will all the recorded data live?
There are also reasonable concerns to consider regarding third party use generally. The wealth of information a device like the Vision Pro might glean about its wearer is profound — not just eye-tracking, voice, and hand gestures, but all the characteristics that can be inferred from those inputs. Some of this I touched on above — details on what our eyes gravitate to or avert away from — but other inferred data might include how quickly we swipe to close a window (a proxy for disinterest or even disgust), how long we linger on a particular item (like a dress in a shop, a member of the same or opposite sex, or an advertisement), and even our emotional state from the tone of our voice. Hell, the latter already happens: Google Home already picks up that I’m angry and apologies when I lambast it for assuming for the 80th time that I want to play Savage Garden when I ask for the latest episode of the Savage Lovecast.
Even if Apple keeps the raw input feeds or the camera images to itself, there’s still lots of opportunity for exploitation. And there will, as always, be an inevitable cat-and-mouse game. Just as Apple puts up barriers, enterprising developers will find new and more devious ways to bypass them. I also have concerns about the potential misuse of facial data, especially in the context of deepfakes, fraud, governmental requests, and other badness I haven’t even thought of yet.
Right now, there’s been very little discussion (or media introspection) about these concerns, and even less detail on the safeguards that Apple plans to put in place.
Ownership of Facial Data
Finally, there’s a lingering question about the ownership of all this data. Will Apple own your Persona and/or iris scan? What about the ML model for creating your Persona’s representation in apps like FaceTime? Will Apple have the ability (under a license you can’t modify) to use that likeness for other purposes?
Some Final Thoughts on Design
One thing that struck me is how the majority of articles I found discussing any of these issues, weren’t from privacy pros or technologists, but from UX and designer types. This makes sense: fundamentally, many of the privacy concerns I’ve raised here also are design questions. The choices that Apple has and will continue to make with regard to the Vision Pro, including features, access, and even something as benign as battery portability — are not neutral technical choices. Which is to say, Apple is making many value-sensitive design choices along the way, which may have profound implications on our rights, our privacy, our culture, and how we interact with one another.
It’s clear that Apple has values-sensitive design in mind, which is why compared to others in this space (Meta, ByteDance), there’s at least some forethought about privacy harms. For example, both Apple’s developer guide for building apps for the Vision Pro and their best practices guide offer concrete suggestions on how to 'prioritize privacy'. But their privacy suggestions still suffer from the same myopic views shared by most tech companies — they overemphasize notice, control & security, and under-emphasize limits on collection, storage, accountability, use, & retention.
In his always interesting blog, Don’t Worry About the Vase,
observed that, if this takes off, Apple’s design choices, rules & vision will dominate. “Screw you, our way or the highway, our way is the right true and cool way,” he said. Part of the success that Apple has had over the decades is based on the fact that they spend a lot of time and money thinking about design, and they tend to make choices that the market generally agrees with. But at its core, Apple’s vision is one that fundamentally limits our ability to make choices, beyond the choice of whether to use Apple products at all. Unlike Android, there is only one designer of iPhones, Macbooks, and the Vision Pro — Apple. There’s a singular vision that you’re either on board with, or you’re not.But this also means that it will be Apple’s vision of privacy that comes to define our interactions — at least with Vision Pro users. Something worth considering.