The Power of Links and Second Brains: Part II
Or how I used Obsidian + AI to make sense of the law
// Author's Note: Due to the absurd length of this piece, I broke this article up into a two-parter. Part I discussed my crappy memory and the coping tricks I learned to adapt, link & social network analysis, and how I began using a 'second brain' tool (Obsidian) to start tracking cases.
The Power of (Controlled) AI
I know I'll get some hate for this piece from the AI Doomer crowd, but I don't really care. If you find AI worthless, its creators morally bankrupt, or generally think we're all getting turned into paperclips by Skynet, that's fine, you do you, and maybe skip this part. Personally, I have found tremendous benefit from the use of AI tools in the context of legal summarization when combined with human review and oversight. Or as I think of it, treating AI like an always-available average-intelligence 1L intern.
Technically, I started using Obsidian well over a year ago, but never really got into it until more recently. Part of what inspired me was the surge of development in add-ons and features, including all the shiny OpenAI/Claude/Hugging Face integrations. After trying a lot of duds and half-cocked ideas, I finally found an OpenAI integration that works really well for me -- the Smart Connections add-on created by Brian Petro. Smart Connections allows users to query ChatGPT's API directly about the files they store in Obsidian. It's "smart" in that it doesn't upload everything to OpenAI (unless you allow it), and instead lets you selectively target a specific note or notes. It also boasts some interesting features, including a smart linkages function that helps to find connections between notes. For example, if I'm working in a current note and it references the concept of 'surveillance', Smart Connections will surface other notes that also discuss surveillance.1 However, the main feature I rely on is the Smart Connections Chat function. It's a chatbot built right into Obsidian and it allows me to use a modified and vastly improved version of the Case Summarizer prompt I developed directly within Obsidian.
I've spent nearly two months improving this prompt -- a reminder that LLMs require iteration and refinement, and rarely work perfectly all the time. Y'know, like most types of software. I've discovered that LLMs also follow a tried and true rule of data management: Garbage In, Garbage Out. If you put garbage in (generic, ill-defined, or imprecise prompts, for example or open-ended, crappy, or inaccurate data), you get garbage out (hallucinations, bullshit nonsensical responses, or generic text that isn't very meaningful). To get to a reasonable approximation of 'good', there's a bit of manual massaging and prompt tweaking that you need to do. And it also helps to focus what the LLM is using as its source material. For example, early on, my prompt included a broad request for 'cases discussed' in court decisions. That worked well ... for about a month. And then I noticed it just stopped. I had to write a more precise prompt in its place, and include examples of the type of detail I was looking for.
A tip to the wise: When generating prompts, the more examples and structure you provide to ChatGPT, the better your result is likely to be.
ChatGPT isn't perfect, but that's ok, I don't really need perfection. I need something that can provide a quick summary or overview of a thing, spot issues, pull out relevant themes, identify conclusions and cite the laws, cases, and concepts relied on in a decision or guidance document. In other words, I need an unpaid, reasonably competent first-year law student (1L) that can work whenever I need them. With the prompt I've created, ChatGPT + Obsidian now roughly approximate your average 1L. It does a solid first pass, but occasionally misses nuance, subtleties of law, and expressions of opinion (aka, dicta).
To compensate for that, I review each case, but that review process is much faster compared to reading the case and doing the summarization work from scratch. A shortish case that might have taken me 1.5-2 hours to read, digest, and summarize now takes about 20 minutes. And that's including tagging, supplementing the summary with bits the LLM missed, and making corrections when ChatGPT occasionally misinterprets something.
Back-of-the-envelope math tells me that for the 277 cases I have added so far, that's a time savings of between an hour to an hour and a half, or between 11-13 full days of work. And that's just cases.
Doing the review (as opposed to the manual analysis) also means I get the benefit of reading, minus the slog. If you read cases and regulatory decisions, particularly in the EU, you'll observe that there's a LOT of repetition. The DPC notoriously reiterates the same points throughout their opinions. The same goes for ECHR decisions.
There's merit to this (when you're trying to preserve a record for a higher court, or you want to ensure that your rulings are understood in the future), but it creates toil for everyone who isn't your target demographic (aka, most everyone else reading a decision). Since I am not the target demo, that means there are loads of irrelevant bits in most decisions -- areas of law that aren't related to data protection, judicial procedure, and bare recitations of the laws themselves. That's a lot of time wasted that could be better spent petting a cat.
I don't need to read the definition of Article 9 GDPR again. I need to understand the court or regulator's interpretation and insights of that law. I need to understand their conclusions and a bit on how they got there. My (refined) Case Summarizer prompt identifies these points while Obsidian's unlinked mentions and ability to link to incoming/outgoing citations means that I catch (most) relevant ideas, concepts, and cases that the Case Summarizer might have missed.
Interestingly, this approach also allows for future discovery in a way that relying on a disconnected written record (or Westlaw search) might not. If I explore a new case, or want to dig into a concept that I hadn't originally identified, Obsidian can easily surface an old decision related to that concept. For example, I hadn't created a concept term for the 'primacy of EU law' until after I read through a summary of the Commission v. Poland decision referenced in the December 2023 ZQ v. Medizinischer Dienst case. Within that concept, Obsidian identified a number of additional cases and citations, which I then tagged.2 I share the output below:
Linked mentions 9
06_SIU_2018 In the matter of Galway County Council B v. Latvijas Republikas Saeima, C-439_19 Commission v. Poland, C-204_21 Commissioner of the Garda Síochána and Others, C-140_20 La Quadrature du Net and Others, C-511_18, C-512_18 and C-520_18 Minister for Justice and Equality, Commissioner of An Garda Siochana v Workplace ...
Unlinked mentions 180
02_SIU_2018 Own-Volition Inquiry of The Surveillance of Citizens by the State for ... 04_SIU_2018 In the matter of Waterford City and County Council 05_SIU In the matter of Kildare County Council B v. Latvijas Republikas Saeima, C-439_19 Commission v. Poland, C-204_21 Deutsche Wohnen SE v. Staatsanwaltschaft Berlin, CJEU-C-807_21 Hauptpersonalrat der Lehrerinnen und Lehrer beim Hessischen Kultusministerium, ... Interesting side project Ligue des droits humains v. Conseil des Ministres, C-817_19 Minister for Justice and Equality, Commissioner of An Garda Siochana v Workplac ... VD v. SR, C-339_20 and C-397_20)
Linking Makes Me Happy
I recognize that I am a weird dork in that this sort of connection-building brings me intrinsic joy. Finding these relationships itself is fun -- it's like solving a mystery or completing a puzzle, especially when you start to see tangible results.
I've now built up enough of my second brain that I can now reliably query for cases and look for legislation and guidance that share tags or relevant concepts. I can actually make discoveries about my data and I've been able to use that when talking with clients and even commenting on LI or Twitter.
I also now get pretty graphs like this which show connections I might never have noticed on my own. For example, starting with the concept 'joint controllers' I get an interesting set of cases, legislation, concepts, and guidance materials. When I expand out (in this case, by focusing on the Fashion ID and Nacinonlinis cases), I start to see patterns and trends. Some are obvious and I already knew (the phrase 'purposes and means' is commonly tied to joint controllership), but others, like the fact that many cases involve online social networking, weren't as obvious (at least to me).
There's loads of wonkiness still and things I need to sort out and clean up (for example, that graph doesn't always render cases accurately), but as a minimal viable product, Obsidian + Smart Connections suits my needs. As I flesh out more cases, legislation, guidance documents and broaden my concepts (and their associated aliases!), I foresee this leading to better insights and greater clarity. More connections, and importantly, perhaps a slightly better view of the fractally-complex legal world we live in.
Bonus Technical Dorkery Section (AKA, My Toil)
I had a lot of great comments in my first section, and gathered that a few of you were particularly eager for the AI stuff. Some of you might be interested in the technical discoveries, challenges, and work-arounds I discovered and regularly ranted to Husbot about. Feel free to skip all of this if this isn't your bag.
I despise PDFs, especially non-OCR'd PDFs. Adobe (the progenitor of many a cursed file format) can literally die in a fire. The fact that courts and regulators continue to use PDFs to publish their decisions makes my head hurt. I get that there's a time and place for PDFs (redacted decisions, for example), but most cases aren't that special, and it's just as easy to publish as machine-readable HTML. I am particularly annoyed with how inconsistent PDF quality, OCRing, and conversion can be across jurisdictions.
On the upside, the CJEU's HTML-based decisions were a delight to handle, as they were easily parsed by Obsidian. That meant that I was able to get those cases easily and with very little loss to the data format. Kudos to whoever made that call.
To do the PDF to Markdown conversion, I relied slavishly on the pdf2md website (https://pdf2md.morethan.io/). Johannes Zillmann, the person who wrote this tool, is an absolute legend, and I intend to give him some money as soon as I can figure out a proper channel. pdf2md is by far the best one on the market for doing PDF-to-MD conversions. If only someone could add this as an add-on in Obsidian directly. I will also be throwing Brian Petro some cash, because Smart Connections is great and it just works.
I have learned far more than I ever wanted to know about regular expressions (regex), in response to correcting an absurd number of formatting / conversion fuckups. For those people who enjoy regex or want to learn, regex101.com is an excellent resource for learning regex and testing out complicated strings in practice. Regex was particularly necessary because a certain Irish regulator has been wildly inconsistent with their formats, logos, and willingness to OCR their decisions, while also writing some of the longest decisions out there. Please, for all that's holy, switch to HTML like the CJEU. <3
I also have discovered areas where Obsidian could be improved. Principally, while it does a great job of surfacing and easily grouping incoming unlinked mentions (where a separate note references the note you're working in), outgoing unlinked mentions (things like concepts, cases, people) are not grouped and simply appear in order of how they appear in the document itself. If you look at the example below, you'll see what I mean -- cases that mention the 01/SIU/2018 decision are grouped on the left. I can click once and create a linkable reference. On the right, there are over 980 unlinked mentions (!) to other concepts, cases and the like. Many of these are duplicates of the same phrase (accountability, surveillance/surveillance system, case, personal data) and since I generally only link once, it adds cognitive load and time to the review process. For someone with a crappy memory, this is not fun. Ideally, some clever developer could create an add-on that groups similar terms/references but since that doesn't exist, my productivity and speed will continue to be slow at this stage. Depending on how frustrated/motivated I get, I may dig into the developer documentation and try my hand at writing an add-on, but I really would rather not because, I suck at programming. Even with ChatGPT's help.
Obsidian also has a very strange bug where if you are linking on the right too quickly, sometimes it eats the wrong text. I need to file a bug on the developer page about that.
Unlike me, ChatGPT does not suck at programming, or at least it does a reasonably competent job whacking out simple Python scripts. I wrote many Python scripts to handle bulk, repetitive tasks. For example, ChatGPT helped me to write a script that would take a supplied EU regulation or Directive (like the GDPR), provide it with the number of articles that regulation or directive contained (e.g., 99), and then the script would incrementally auto-generate labeled files which included the Article #, and relevant frontmatter (tags, type, url, aliases). Essentially, it created an article stub for each law, but it made the process of later linking to references in cases 1000x easier than if I had to manually do it. I would like to do something similar with relevant concepts, and Jeff Jockisch has given me some useful starting ideas for that.
My main suggestion for anyone wishing to use Obsidian in this way or for a similar project is tag consistency. For me, that means underscores and consistent title casing. Underscores and title casing make life easier because there are so many ways you can write the same thing and sometimes you forget. Committing to a format (Personal_Data, Right_of_Access), means that you're more likely to get it right, or easily spot where you don't get it right. It also avoids the problem of extra bonus spaces that nobody wants and odd Obsidian quirks. Finally, I think ChatGPT handles title case/underscores better when you ask it to generate them, than if you leave the underscores out. That said, sometimes I still mess up -- I still need to correct all the times where I have Automated_Decision-Making instead of Automated_Decision_Making or Automated_Decisionmaking, for example.
As a separate aside, while tags are powerful, the concept of keywords in Obsidian is either not really a thing, or certainly not a developed thing. They are hard to use effectively, largely duplicative of tags, and are treated very weirdly by add-ons and the graph functionality. They're pretty useless, and I'm still kicking myself for thinking they were important for so long.
I mentioned this in a footnote on my previous article but, as much as I love the graphs, they still have a long way to go in terms of development and usability by normies. The internal graph view Obsidian ships with is good, but limited. Juggl is more powerful, but it also requires a fair amount of hackery and programmatic incantations to get it to do things you want. And even then, sometimes it just ... doesn't. You can see that in the joint controller example above. Why are those cases in black? The hell if I know.
Anyway, if you’ve read this far and you have thoughts, I’d love to hear them.
I'll admit, I rarely use this feature because it's rather hit-or-miss on the quality of its outputs.
Astute observers may note that some cases are not yet linked. I have made a choice to only tag cases where it looks like the decision substantively touches on that subject or area.