The Power of Links and Second Brains

How I Make Sense of the Law With Obsidian and AI (Part I of 2)

Jan 15, 2024

// Author's Note: The original draft of this post was over 5300 words. Nobody wants to read that in one go. To spare everyone's eyeballs and time, I decided to break the original up into two parts. The first part will discuss my crappy memory and the tricks I learned to get around this, my love affair with link and social network analysis, and how I started to use a 'second brain' tool (Obsidian) to start tracking cases.

Part II will discuss how I have supplemented my second brain by harnessing AI (while not going completely off the rails), as well as some thoughts on how I can improve my process, and some lessons I learned along the way.

When I was younger, I realized early on that I was absolutely awful at remembering details. Whether it's programming syntax, the names of people I've met, historical facts, trivia, dates, or even details of my own life, I struggle. I can retain concepts and broad themes reasonably well, but I'm absolute rubbish at the details. And no matter how many 'memory palaces' I devised, or spaced repetition exercises I undertook, my memory for details has never improved. If anything, it's gotten worse.

To get around this, I learned to type quickly, take detailed (almost verbatim) notes, and importantly, focused on where to find things. Rather than fruitlessly trying to memorize stuff, I learned how to think contextually and deductively, expertly navigate databases, and writing down my steps to ensure that I'd get similar results in the future. The benefit to this approach is that it's scalable to loads of things in life. Rather than knowing the finer details of "lex specialis" or a specific recipe, I figured out where to find the answer quickly. Let the computer keep all that stuff in memory. I had better things to do with my days. It still doesn't help me with remembering people's names though. sigh

Being good at knowing where and how to find stuff has landed me some pretty cool jobs over the years. I was a Google Answers researcher during undergrad, and in law school, I taught fellow students how to use the legal research database Westlaw (which was, in 2002, a bit of a maze to operate and didn't have any of that fancy 'natural language' search functionality built in). After I graduated law school and tried being a lawyer, I went back to my love affair with finding things and landed a job at Palantir Technologies (yes, that Palantir), where I spent my entire time playing with their flagship product, Palantir Gotham. Gotham is a remarkable platform -- it was certainly mind-blowing in 2012 when I was using it.

The Palantír: Less All-Seeing Stone, More Big Analyzing Link Engine

Part of what makes the Gotham product so powerful is the fact that you can generally stick anything in it, easily add additional information, and then use that to discover new connections between what you ordinarily can't. For example, when I was at Palantir, I used the product heavily to conduct competitive intelligence. I might take some structured piece of data (like an Excel spreadsheet with information on competitors, such as the company name, leadership information, annual revenue, website, address, etc.,) and cross check that against unstructured text or images or other information (like emails, news reports, or photographs). But Gotham isn't just a glorified search engine. The real power behind it is the ability to create an object (e.g., a person, place, thing, etc.) and supplement, associate, or link other data to that object.

For instance, if I wanted to find details about whether a particular competitor's CEO bragged in the press about their company, I could create a profile (or object) for the CEO, and then search and create links to details that mention the CEO across all my data. I could add a photo (an unstructured image) to that object. I could also add references or attributes (essentially, data about the data itself, known as metadata) about the CEO, and link to internal or external details like news reports that mention them. Essentially, Palantir was a very fancy, expensive dossier creator. But what really made tools like Gotham stand out was the ability to visually represent those connections by way of a linked social graph.

Social Network Graphs Changed How I See the World

To understand why visualizing links is powerful, I need to briefly explain the concept of social network analysis. Social network analysis (SNA) is the process of identifying and understanding social structures and relationships between people and groups. It's built on two different, but related concepts -- network theory and graph theory. The easiest way to think about this is in the context of human social groups, but SNA is not limited to friend circles and Twitter follows. A group can consist of two or more entities (also referred to as nodes, actors, or in Palantir-parlance, as objects). A person or place can be an entity, but so can a concept, a physical object, or a law. The actions or events that bind those entities together are called edges or links. Edges and links define the nature of the relationship or the interactions between two or more nodes.

As an example, let's say there are five members of a group of individuals -- Alice, Bob, Carey, David, and Edith. These five represent our nodes.

Alice and Bob are friends, Alice is the sister of Edith, and Alice is Carey's enemy. Carey is friends with Bob and Edith (but Bob and Edith aren't connected to one another). Carey is also married to David, who isn't affiliated with anyone else in this group. These relationships represent the edges or links between our nodes.

We can use social network analysis, and in particular, a sub-discipline called link analysis, to visually represent the relationship of those individuals to one another:

A graph showing the relationships between five individuals: Alice, Bob, Carey, David & Edith. Lines point from one entity / person to others.

I'll explain how I whipped up this graph in a later section, but the point is to give an idea of what a link analysis graph might look like.

But link and social network analysis isn't exclusively for uncovering social relationships. It can also be an invaluable way to unlock insights in other domains -- connections within and between systems, relationships between decisions and decisionmakers (and influencers), and the paths that define established and emerging laws. In the right hands, some version of link analysis might even help us to fix some of the legal code debt I wrote about last week.1

At some point around the age of 23 or so, I (mostly) realized that I can't do much to meaningfully change the system, but I can use what I know to make my own life easier, and maybe help a few others along the way. Likewise, I can't stop the flood of decisions, documents, directives, and dissertations on data protection, but it dawned on me that I could use the tools that I had at my disposal to build my own mini-Palantir, or at least a second brain.

Obsidian: My Second Brain

The main workhorse I use for my 'second brain', as it were, is a platform called Obsidian. Obsidian is a personal knowledge management (PKM) platform that offers a ton of powerful features, among them the ability to surface and easily link concepts together and represent them in a variety of different ways, including via database queries (by way of an add-on tool called DataView) created by Michael Brenan, and through a linked graph. Obsidian comes with its own graph analysis tool out of the box, but I also rely on a separate and more powerful add-on called Juggl, which was created by Evan Krieken. Juggl allows for loads of customization, but it takes some time to figure it out. I used Juggl to create that graphic above.2

I've used Obsidian for over a year now, and honestly, it's the closest thing to Palantir that I've been able to find, though it's not 1:1. Firstly, Obsidian is quite picky. It only really ingests text, and prefers text written in Markdown). It does okay-ish with html files, mostly ignores images, and completely and utterly chokes on PDFs. But Obsidian has a thriving community of dedicated and passionate users behind it, and between the core product improvements and dedicated add-ons created by developers, Obsidian gets the job done. I put (almost) everything into Obsidian these days.

Three things drew me to Obsidian over other PKMs like Notion or LogSeq. The first was how efficient Obsidian is at surfacing relevant links and related concepts. If you've got a markdown file called ''AI.md" and you've added 'alias' metadata, like "artificial intelligence", "artificial general intelligence", and "machine learning", Obsidian will highlight all those terms as 'unlinked mentions' in other files, and allow you to easily tag them with a click without changing the term in the original document.

The second reason I love Obsidian, is that it allows me to self-host (unlike Notion) and store my data locally or via a cloud solution of my choosing, and it scales really well (unlike LogSeq which chokes on large and/or voluminous files). Finally, Obsidian really does have some of the best add-ons out there.

There's Far too Much Law to Keep Up With

All throughout 2023, I felt like a bad person. I was never really able to keep up with (much less read or retain!) all the data protection and related cases, legislation, guidance docs, and everything that seemed to spill out of every orifice on LinkedIn. There was just too much. At some point, I went through all five stages of grief, and realized that I just needed to accept that I would never keep up on my own with all this information any more than I could convince people not to be wrong on the internet. So in November, I decided to get smarter. I put my second brain to work in earnest.

To benefit from Obsidian's power, you need to think strategically about how you categorize, classify, and label things. Librarians and database nerds refer to this classification process as developing a taxonomy or ontology. I try to keep my taxonomy fairly simple with only a few types of entities, categorized in a small nest of file folders, broken up into broad themes -- e.g., Authorities, Decisions, Legislation, and Relevant Concepts. I also have a directory for content I tag and highlight using a tool called Readwise.3 Folders may have sub-folders of course (e.g., sub-folders under Legislation for the GDPR, with additional sub-folders for Articles and Recitals), but I try not to go too deep.

To make my life easier, and my taxonomy more consistent, I rely heavily on templates (using another add-on, Templater), which allows me to semi-automate creation of certain common fields. For example, most Decisions follow a template model that looks something like this:

type: case

aliases:

ADPA
C-390/21
390/21

author: Court (Eighth Chamber)

url: https://curia.europa.eu/juris/document/document.jsf?text=&docid=267607&pageIndex=0&doclang=en&mode=lst&dir=&occ=first&part=1&cid=8459648

tags:

VIN
Right_to_Information
Right_to_Access

decision_date: 2022-10-27

These fields appear for every decision and are present at the beginning of the document in a markdown format called 'frontmatter'. Frontmatter acts like metadata about the file itself.

A note about tags: Tags are powerful in Obsidian. They act like keywords or topics to help you easily search and find things of relevance. They also feature prominently in the social graph and are very easy to query when using tools like DataView. You can also see at a glance how many files share common tags. Here's an example:

A count of specific tags (Personal_Data, AI_Act, Data_Protection_Commission, etc.) in Obsidian

And so, over the winter holiday, I began the process of uploading each and every data protection or privacy-related decision from the Court of Justice, the European Court of Human Rights, and the Data Protection Commissioner into Obsidian, as well as relevant EU and US legislation, EDPB Guidance documents, and other content that was relevant and permissible to copy under the respective copyright laws.4 I also uploaded adjacent cases and laws (or in the case of the AI Act, draft versions of the same). Eventually I may get around to more US, Irish and other high court / trial court decisions and other Supervisory Authority guidance, but that's still TBD.

In parallel, as I started to observe patterns in legislation and decisions, I created a directory of Relevant Concepts. This is, by far, the most manual, and simultaneously one of the most important categories I have in my taxonomy, and I went through a few iterations to get to where I am. I loosely based my model around the Legal Information Institute's community-built dictionary, Wex.5

Wex's database of terms include a definition and links to relevant cases and laws. More importantly, Wex concepts are well-integrated into the primary source material. It's very Wikipedia-like in that respect.

Here's an example from Wex for 5 USC § 105 - Executive agency:

5 U.S. Code § 105 - Executive agency

U.S. Code
NotesFor the purpose of this title, “Executive agency” means an Executive department, a Government corporation, and an independent establishment. (Pub. L. 89–554, Sept. 6, 1966, 80 Stat. 379.)

To date, I have around 500 terms in my Relevant Concepts directory. Most are not (yet) defined, but they do include links to relevant legislation, guidance, and cases. Putting this together is what inspired the Legal Code Debt article, after I realized that all of those laws define the same key terms wildly differently.

I also have 277 (and counting!) decisions or judgments, nearly 50 different pieces of legislation, and around 30 regulatory guidance documents, scanned, searchable, and to varying degrees, tagged and linked. My Obsidian graph now looks like a beautiful, colorful globe:

A view of my main Obsidian graph using the base graph creation tool, with Cases tagged in green, Legislation tagged in yellow, and Relevant Concepts tagged in red). Stuff in the center is more heavily linked -- stuff on the edges are more likely to be unliked orphans -- things I haven't found or made connections to ... yet

Read on for Part II (coming tomorrow) of this series, where I discuss how I've integrated ChatGPT and some clever Obsidian functionality to build out a semi-automated case analysis tool, how I worked around the limits of my automated 1L intern, and some lessons learned.

Of course, this line of thinking sometimes causes one to end up in situations where you're frantically pointing at whiteboards and screaming about Pepe Silva and the mail.

Juggl is great, and very powerful, but my god does it require a lot of under-the-hood fiddling to really make it go. The downside to a collaborative tool like Obsidian, where add-ons are mostly created by die-hards in the community, is that while there's lots of add-ons to choose from, developers tend to design these things for themselves. That means they're maximally flexible and customizable if you know how to customize them, but difficult for newbies to learn, sometimes weakly documented, and tough to figure out.

Readwise is excellent subscription-based tool, run by a very responsive team of developers. They also have a proper RSS reader that does something most cannot -- it converts dreaded PDFs into readable text. Plus, the app has built-in integration with Obsidian! I highly recommend checking them out if you're a news/information junkie like me.

As in the US, copying and re-use of CJEU, ECHR, and DPC decisions, as well as EDPB documents is permitted, provided source attribution is maintained and the original meaning and message of the content is not altered. I have committed to honor that, which is why the URL field appears in frontmatter.

I have not (yet) incorporated the Wex dictionary, and I probably won't. While Wex is covered by a Creative Commons NC-SA-2.5 license (http://creativecommons.org/licenses/by-nc-sa/2.5/), I don't know exactly what I plan to do with my database in the future, and I want to be careful about building out a large volume of work contingent on agreeing to a non-commercial license.

Privacat Insights

Discussion about this post