On Knowledge Graphs

I take a year of posts and throw them into a knowledge graph for fun.

On Knowledge Graphs

LII

2024.06.23

Wow… we made it… the one year anniversary of my blog. 52 Sundays of delivery without fail.

Perhaps as a way of celebrating, we take a lot of the posts from the last year and put them into an interface for you to better explore them. But first, I explain why.

Point of My Blog

Conveniently timed with the one year anniversary of my blog, I’ve constructed a knowledge graph that helps visualize the relationships between topics and my posts.

When I started writing, part of the point was simply to keep people updated on what I’m up to; a bigger point, though, was to explain and codify my world view in a communicable way. And, part of that involves doing a very good job of connecting seemingly disparate things that I group together in my head.

As an example, in posts, I’ve connected Brazilian Jiu Jitsu to building computers, my trip to Lisbon with Information Theory, and discounted cash flow models with suffering. That’s simply how I think about the world, and I’m trying to communicate it without sounding like a complete and total loon.

So, the natural extension of this is to put the posts in more obvious relation to one another, as well. If I talk about optionality in terms of jiu jitsu in one post and optionality in terms of financial markets in another post, then having those posts serve as a bridge between jiu jitsu and financial markets is the next natural step to articulating my world view.

Sure, I could make a better effort to set up hyperlinks between related posts, but, really, this is a job that is better done by a machine than a human. The machines won’t forget which post mentions what—I will.

So, enter the Knowledge Graph (KG), my answer to further codifying and articulating the connections I see and the world with my posts as the glue holding them all together.

KG Crash Course

Quite simply, a Knowledge Graph is a way of storing information that contains “nodes” and “edges.” The nodes represents some sort of “entity” or “idea” while the edges represent a relationship between the nodes. Typically, you also have some sort of semantic meaning tied to nodes and/or edges to make the graph interpretable.

From a very simple thing, you can get quite dazzling and complex results. Cool examples:

  • Wikidata: A companion to Wikipedia, Wikidata has information stored as a knowledge graph with a subset of Wikipedia pages as nodes and different connections between them as edges.

  • GDELT: This one is maps real world entities and geopolitical events; I set it up on Google BigQuery at one point and it robustly let me see which countries were having insurrections, but, unfortunately, wasn’t great at telling me about M&A activity between US hospitals.

The two things that fascinate me most about kg’s is that they visually communicate information in a way that helps a human understand the concepts more, and they also allow for what feels to be more intuitive search of the data.

My KG

My KG is quite simple.

You have three types of nodes: crafts, abstractions, and posts. Each post node is one of my blog posts. Crafts are literal, real world practices, while abstractions are a mix of theories, heuristics, strategies, and tactics. The jury is still out on how I will further break up abstractions into subcategories or maybe even group similar ones together (information-theory and singal-vs-noise have a relationship that could be expressed in a number of different ways).

While all of the crafts, abstractions, and posts originated with me, I let the machines handle making the connections between them. After I came up with the abstractions and crafts, I wrote a script that iterated over the content of each post and, for each craft or abstraction, asked an LLM if the post was related to that topic.

Quite frankly, that’s not a very great solution, because I didn’t really explain what it would mean for a post to be related to weightlifting. I ended up manually creating key words for each craft and abstraction and adding additional connections between posts and the concepts if the script found the keywords in the post. This filled in some obvious missing holes; On Running wasn’t connected to running… now it is!

When you click on a node, you get to see what that node is connected to. That being said, a short coming that is addressed below is that you don’t get to see why the connection is what it is. Still, it can give you a pretty good idea of how “central” each topic is in my graph.

“optionality” has a high number of connections.

Overall, the graph is pretty lightweight and missing a lot of features, but it’s much more useful than my first draft (below), which doesn’t even have any sort of interface when you click on a node and has quite sloppy connections between things.

Draft one. Pretty, but not useful.

Further Development

I have a lot of lofty ideas for how I can keep playing around with visually presenting my blog posts, but there are also some very obvious gaps in the current solution.

I’ve been thinking about maybe reframing the whole thing as a category, with each concept as an object in a set and the posts as morphisms between them. I’ve also thought about making the concepts very hierarchical and adding an assembly index to each node based on it’s components.

That being said, there are a lot more obvious and low hanging fruit to make the current KG more useful.

I may re run an LLM with a more aggressive and discrete prompt for relationship creation and then maybe cross check the answer against something else. I might also throw in labels for whether the connections were based on an LLM or keywords.

More important, though, is making it easier for a human to interface with the actual data. Right now, if you click on a node, you can see everything it’s connected to and get hyperlinks to them if they’re posts. I think a more natural interface might let the user “crawl” across the graph and explore the topics.

This would be in line with serving more context for each node, such as a summary of posts displayed in the search interface or descriptions of abstractions/crafts. While the descriptions for the abstractions and crafts would be presented by me, I’m sure I would give the machines a shot at summarizing my posts. All of that context would also probably give the LLM a better chance at making useful connections between the nodes, too.

Additionally, there are graph specific stats you can get, such as the “centrality” of each node and the shortest path between nodes. Maybe I’ll add something in to let the user find the distance between two concepts and how to navigate between them as quickly as possible.

Another angle that would allow me to take advantage of the fact that the core of the app itself is in a lisp dialect would be some sort of recursive search across the knowledge graph. I could have fun making a macro that is called to craft queries in some sort of recursive search function. It would be more complicated than simply switching to an actual graph db, but I think I would learn more, too.

The possibilities truly are endless. We’ll just have to see where the random walk of life takes it.

I want to thank all of you who have taken the time to read any of my posts over the past year. I have such a fun time writing and publishing these, and whenever I hear from somebody that the content was useful, or even just entertaining, it makes my day.

The Lindy Expectancy of my blog is now 2 years. So, you can bet on having at least 52 more posts inbound.

That being said, I still don’t have what I’d consider a proper “title” for my blog, or one that I’m very satisfied with. I’ve always figured one would be emergent, and still believe that. So, if you have any ideas or suggestions after the first 52 posts, please let me know.

Live Deeply,