(Re-identification) Attack of the Hedge Funds

How a financial regulation paper made me start questioning everything I thought I knew about data protection law

Sep 24, 2024

@crapotkin has been a rather faithful reader (and paid subscriber, so yay!) for a few months now, and somewhere along the way, he reached out with a question ... which led to an interesting conversation and more recently -- meeting up in person at the Eyes off Data Summit (https://www.eodsummit.com/) that took place in Dublin earlier in the month. At EOD, we chatted a bit about a guest post that was vaguely in the wheelhouse of what I cover on the blog, and after some ideas were bandied about, he came back to me with this -- and it's great. Highly worth a read. Bonus points for mentioning Matt Levine of MoneyStuff fame, and a fascinating bit of inference and deduction using mobile location data. Do have a read, and subscribe to his substack ( https://navelobservatory.substack.com/)! -

Carey Lening

Imagine a free app on your phone. How does it make money? A well-known way is to sell your data. One potential buyer might be a trader (or her employer) looking for an edge. Once a quarter, Tesco tells shareholders about its earnings, and that has an effect on it share price. But a trader with data about how many devices have been located in Tesco stores this quarter, how much time they spent there, whether they were new iPhones, etc. might be able to take an educated guess about the next earnings report. Based on that guess, she might decide that Tesco shares are under- or over-priced, and make a nice profit by buying or selling some.

The most obvious concern here is privacy.¹ Maybe you don’t like the idea of some lass at a hedge fund knowing you spent a full 24 hours of your life in a supermarket last quarter. Maybe you’ve been spending time in seedier places than supermarkets, and that’s also in the data. Maybe you’re worried that your bank will notice those late night trips to the casino you’ve been taking, and think harder about that loan.

If you live in a place with functional data protection or privacy laws, they probably require buyers and sellers of the data to address those concerns. One way to do that is by making it less obvious that the data is about you. The methods for doing this are many and varied, but mostly they involve either getting rid of certain data fields, or replacing them with something else. Possibly a combination of the two. However, this is surprisingly hard to do well, particularly if you want to preserve any useful data at the end of the process. Clever and motivated attackers can, if they have enough data, usually find ways to conduct re-identification attacks. At that point all those privacy concerns are back on the table.

Financial institutions employ a depressing quantity of clever people. But the usual feeling in privacy and data protection circles is that they aren’t really motivated to look at individual-level data. Our hypothetical trader from earlier isn’t interested in you, she’s interested in Tesco. The number of people shopping at Tesco is interesting. Their socio-economic status is also interesting. But who you actually are? Not really the relevant question.

Or maybe it is? Matt Levine (paywalled - but you can give Bloomberg your email and get access!) in his excellent Money Stuff newsletter, just highlighted a recent finance paper that raises that question. Briefly, some finance academics wanted to investigate the effect of SEC investigations on their targets, and devised a pretty... err.... novel?... methodology for doing that.² From the abstract:

The Securities and Exchange Commission's investigative process remains opaque and challenging to study due to limited observability. Leveraging de-identified smartphone geolocation data, we provide new insights into the SEC's monitoring practices by tracking SEC-associated devices that visit firm headquarters. Our findings reveal that the majority of SEC visits occur outside of formal investigations, with larger firms and those with a history of SEC enforcement actions being more frequently visited… On average, these visits are material, evidenced by significant stock price reactions, even in the absence of subsequent formal investigations or enforcement actions.

They found that re-identification was worth it! SEC visits did have a statistically significant impact. This raised some follow-up question from Matt:

1. Is this signal — -1.4%-ish abnormal returns over three months — just an academic curiosity that practitioners can’t use?
2. Or is it a signal that hedge funds are already using, and the academics have blown up their spot?
3. Or is it a signal that hedge funds didn’t think of, but that the academics did, and that the hedge funds will now incorporate into their models?

The question that strikes me is: who else is worth re-identifying?³ Because once you start thinking this way, the list of people whose data might let you infer something interesting about where the market is going starts to get pretty long: superstar employees, senior managers, supply chain professionals, research scientists, high net worth individuals, key influencers… I’m sure you can think of your own.

Clearly, this raises some privacy issues. The incentives in financial markets are pretty brutal - if you know about something that could get you an a trading advantage, you should proably start doing it before your competitors do.⁴ Even so, it’s a comparaively small aspect of a much larger problem. I said earlier that with enough data, a clever and motivated attacker can re-identify the individuals. Even without the sort of sharpened incentives that financial uses bring, both those conditions are being met on a regular basis. Between the frequency of data breaches and the flourishing data broker industry, there is an awful lot of data about people sloshing around out there. The list of motivated people includes intelligence services, pro-life lunatics and church politicians, and the consequences of those attacks are more immediately troubling than someone being on the losing end of a trade.

The US has spent the last 20 years failing to pass the sort of generalised, country-wide privacy law that would start to tackle this stuff. Maybe the fact that there’s a now financial stability angle to this will finally get them over the hump.

But individual privacy doesn’t feel to me like the only issue. To go back to the example I started this with, imagine that our trader has some sort of exclusive access to the app data. Maybe they sign a contract with the app developer. Maybe they are the app developer. At this stage, what they’re doing looks an awful lot like insider trading, but… probably isn’t?⁵ Whether they can identify the people whose data they’re trading on is, for these purposes, kind of irrelevant.

Concerns about privacy, unfair trading, and a whole bunch of other things are just specific examples of a more general issue: there’s now an enormous amount of data about us, and certain organisations get to base their decisions on it. Those organisations often don’t have to take any account of our interests, but the decisions still affect us.This is a big problem and, over the last fifteen years or so, a lot of people have been thinking about different aspects of it.⁶ Over the coming posts, I’m going to try and pull some of that work together, show some more examples, and examine the ways in which people are addressing, and failing to address, the underlying issue. If that sounds like your jam: subscribe!

Footnotes

If you’re in Europe, the concern is data protection, but this post is aimed at people who don’t get too exercised about the distinction.
This is the US, so no GDPR considerations or anything, but I would have loved to see how the ethics review for that research went.
I am playing slightly fast and loose with the idea of 're-identification' here. The study actually identified devices as associated with the SEC, based on the fact that they spent more than 20 hours a week in SEC buildings. Sometimes this may have been single individuals, but more often it was probably groups. There’s potentially a significant legal difference between the two cases, but I don't think it affects the broader point. So long as you can get that granular with your data, there are going to be cases where it’s possible and potentially profitable to focus on the level of the individual.
At this point, I should probably be clear that I have no evidence that any financial institution is using re-identified personal data for trading purposes. I've done some financial services work, but nothing for the sort of organisation that would be in a position to trade on this sort of information. If they were doing this in the UK, it would likely be a criminal offence, and a bit of googling on the subject suggests it wouoldn’t be risk-free in the US, and that firms would want to have procedures in place to prevent this. OTOH, Matt Levine is intimately familiar with this stuff, and he just assumes it’s going to happen.
I am not a criminal lawyer! Or a securities lawyer! But just looking at the langauge of the offence in the UK and the way it’s constructed in the US, I feel like you’d hae to torture the language to make it stick.
This is not in itself a completely original insight. There are a fair few books on the issue (maybe I’ll review some?) the classic being Shoshana Zuboff’s The Age of Surveillance Capitalism. I that pretty disappointing but she did a great job of putting her finger on (and naming) the basic anxiety. There’s also a ton of truly great stuff on the subject that I’ll try and link to in subsequent posts.

Privacat Insights

(Re-identification) Attack of the Hedge Funds

How a financial regulation paper made me start questioning everything I thought I knew about data protection law

Footnotes