On Epistemology

What Italian American cannoli consumption teaches us about the theory of knowledge

[Efficiency is Still the Name of the Game, Epistemology—The Theory of Knowledge, LLMs & Cannoli Consumptions, Security Through Obscurity]

2025.03.09

XC

We explore why it’s not enough to know things. You also need to know why you know thing.

[Efficiency is Still the Name of the Game]

Back in December, I wrote about lowering the cost of the BirdDog pipeline such that we could check for changes & send alerts on the user defined data set once a week. 

Already, we can check for changes on one third of a user’s account list every night, >2x more often than I wrote about in December. Better yet, the path to updating the entire account list more than once a day while barely increasing costs is now obvious to us, even if it requires quite a bit more engineering. 

We can do this because we are focused on efficiency & epistemology. I’ve written about the former extensively (here, here, here). We’ll be exploring the latter today by looking at GPT’s “belief” about Italian American cannoli consumption.

[Epistemology—The Theory of Knowledge]

When someone asks me what I do, my response is typically “B2B SaaS”, “software”, “epistemology”, or some mix of all three. 

Much to my chagrin, the epistemology bit very rarely lands, so I find myself with no choice but to explain it more in depth here.

From wikipedia:

Epistemology is the branch of philosophy that examines the nature, origin, and limits of knowledge.

-The Internet

My understanding of epistemology is more topical than I’d like to admit (remember, I’m as much not an “educated” philosopher as I am not an “educated” software engineer). Still, when I say epistemology, my intention is to invoke a strong sense of explainability about the way that BirdDog represents and tracks the world.

Rather than just providing data for users, it is critical that we know where the data came from and what makes us believe it is true.

On the surface level, this is as simple as citing the sources. More deeply, it is reflected in a very logic based pipeline, contrasted by our competitors’ focus on chaining together api calls to LLMs like GPT.

If some data is flagged as bad, it is quite simple for us to pinpoint what went wrong where. That makes it easier to fix & to “lock in” improvements. 

[LLMs & Cannoli Consumptions]

Looking at GPT’s response to a number of questions on Italian American Cannoli consumption, we can see it is unlikely that the model has not encoded the world in a way that is directly reflective of what we might expect.

When you ask a large language model like chat gpt to answer a question, there is a probability associated with each response. 

So, if you force it to give you a yes or no response, it’s not unreasonable to look at the probability of the answer being yes as a sort of “probability” the model associated with the answer. 

Based on my test below, gpt 4o feels pretty strongly that Italian Americans like cannolis–after 10 trials, the average probability associated with the model responding “yes” to the question “Do Italian Americans like cannolis?” was near 100%.

If I were to replace “Italian American” with any group I am not a part of or “cannolis” with a number of other words, I’d likely be putting myself under social scrutiny with such a question. In many such cases, gpt may refuse to answer.

This is really quite a poorly formed question, though. What does it mean to “like” a cannoli? More broadly, what does it mean for a group of people to like a thing? Are we asking if more than one Italian American likes cannolis? Are we asking if on average Italian Americans would say they like cannolis? Are we asking if the blended average of cannoli preference on a scale of 1-10 across Italian Americans is over 5 or 6?

Thankfully, it’s pretty easy to “repair” the question. We can ask something like whether or not 50% of Italian Americans have eaten a cannoli in the past year. That is much more clear and meaningful. An objective answer for this question exists, as long as we have a clear definition of Italian American. 

Here, GPT answers yes with 83.5% “certainty.” Seems reasonable enough, but not very clear where the number came from. Perhaps there is some underlying probability distribution associated with the quantity of Italian Americans who have eaten cannolis over the past year?

Conventionally, if we were to ask “Have at least n% of Italian Americans eaten at least one cannolo in the past year,” iterated over different values for n, we’d get something like a survival function, with n = 1% having the highest probability and n = 100% having the lowest probability. After all, it’s pretty sound to think that it is more likely that at least 1% of Italian Americans have indulged in the sweet Sicilian treat than it is to think that ALL of us have.

Graph of a reasonable survival function for a beta distribution, which is what you would likely use to map some condition, like cannoli consumption, in a population

That’s not exactly what happens, though. Instead, when we map certainty of “yes” against the percentage of Italian Americans in question, we get a somewhat jagged graph with a notable dip in certainty near the middle.

I averaged the e^logprob for each question across 30 queries using gpt-4o-2024-08-06. Feel free to message me if you’d like the code to recreate the test with a bigger sample size or different model.

In all honesty, it is pretty impressive that the LLM is as close as it is to having the sort of internal representation we’d expect, and it’s not hard to speculate about why their might be such a spike. If you were able to ask a human with amnesia the same set of queries separately, they might come up with an even more jagged graph.

Still, it serves to highlight one of the big shortcomings of these tools: They don’t inherently encode a representation of the world in the way that we believe is optimal. With repeated trials got it to impressively approximate the distribution that we would expect to see, even then, it is still not quite there—there is “something else” going on in the model. The survival function appears to be emergent behavior.

On top of that, even if the model’s implied survival function was smooth, we’d still either have to blindly trust it or figure out why it looks like what it does. There is certainly more to be said about how this relates to reasoning models and research on model explainability—perhaps we’ll explore that later.

[Security Through Obscurity]

I don’t believe LLMs are the panacea for building an accurate representation of the world. While some people think that if you just keep making them bigger they will keep getting better and eventually transcend our own ability to model the world, I am not entirely convinced of that.

And, while using LLM systems when the constraint is the human workflows might make sense, I’m certainly not convinced that the best way to build a data business for which the value driver is an accurate representations of the world is to just bet on bigger and better models eventually “knowing” more of what the world actually looks like, nor do I believe that the best path is to borrow a world view from LLMs.

Rather, I think it is very important to have your own, explainable model of the world. The LLMs are a tool that gives you leverage, they are not a replacement for making your own representation in such a way that you can explain it and iterate on it over time.

I have a longer version of this blog post that more specifically outlines the bets that BirdDog is making and why we believe we will continue to outperforms an LLM first system for our use case. Regrettably, we really do have an increasingly large number of competitors and I would prefer not to hint them in. 

So, we’re left with what I’ve said before. The best validation will be proven out over time via success in the market.

Live Deeply,