Noah Jacobs Blog
Posts
On Iterative Design

On Iterative Design

How our scraper went from a 50% success rate to a 99% success rate

Noah Jacobs
September 07, 2025

2025.09.07

CXVI

[Rough Outlines, Not Plans; Scraping 101; Scraper of Theseus; Augment, Don’t Replace; A Boat In Water Shows its Leaks; Just Start]

Thesis: To build something useful, you don’t need plans, you just need to start and keep going.

[Rough Outlines, Not Plans]

Whenever you start building anything interesting, it is impossible to know what it will look like one year later, let alone what it will look like when it's "done."

You can go ahead and make granular plans if you want, but you can't predict the future.

As DHH & Jason Fried point out in Rework, you will be learning & gaining a lot of information as you build your thing. If that information is important enough to impact your plans (some of it will be), it would be foolish to not update your plans accordingly.

So, even if you think you know what you're doing at the start, it's going to change and adapt a lot over time. If you're receptive to reality, the thing you're building may actually end up better than you thought was possible.

This is true whether you're writing a book, starting a tech company, building a sales process, building spaceships, steel beams, or even launching a feature for a saas product.

An example of this iterative design process I really like is BirdDog's Scraper. It has a lot in common with most things that become value:

People told us not to build it
It was very bad at first (50% success rate)
Over time, it's gotten to be very good (99% success rate)

What's even more interesting and perhaps more important than the way the project itself compounds is the way that your understanding of it compounds.

[Scraping 101]

A scraper is a piece of software that analyzes & extracts data from some digital source.

Say you want to know all of the Fortune 500 companies that have the phrase "AI Agents" on their website. You could check yourself, or pay someone to check. Or, you could use a scraper to programmatically check all of their websites for you.

Scrapers are cool. A few of the first useful projects I wrote when teaching myself how to code were scrapers.

They're very commoditized & productized, though. You can pay a lot of companies to do it on demand for you.

So, when we were starting BirdDog, a lot of technical people said we should just pay such a company to scrape for us. Why would we invest any time or effort into something that was more or less ‘solved’?

The obvious ‘why’ is that we didn't have any money, but, we did have a lot of time and effort.

Still, if we were to listen to orthodox tech company doctrine, those people had a point—the productized services are not terribly expensive, so we could have likely used one in a limited fashion for an MVP. Then, we could use that MVP to raise money and just keep paying for the service.

But, again, I think scrapers are cool. And, I thought being able to collect a lot of data inexpensively in one place would give us a lot of surface area to extract valuable insights from.

On top of that, so many of our competitors have been farming scraping out to service providers or more commonly skipping that step entirely & just buying data sets from other companies and upselling it with good marketing and a fun ui.

The truth is, when you farm out scraping, you can’t do as much of it because it costs you more*, and when you buy commoditized datasets, your data is inherently not unique.

So, I thought we might be able to get an edge by not only producing our data sets ourselves, but also by making scraping one of the core capabilities we invested engineering time in.

*If you can do something for even ½ the cost of your competitor, you can give the customer 2x more and still make money. Relates to BirdDog’s efficiency obsession.

[Scraper of Theseus]

Consequently, one of the first parts of BirdDog we built was our scraper.*

To begin with, it was very bad--the first drafts failed 50% of the time.

Now, it is quite good--over a year later, the success rate is north of 99%.

This did not happen over night. Rather, it happened through a lot of small iterations and tweaks.

The world breaks everyone and afterward many are strong at the broken places. But those that will not break it kills.

Ernest Hemingway

Something would break, and then we would make it stronger at the broken place. As an example, there are these things called vCards that lawyers like to use to share their contact info on a website. To a human, it lets you call the cellphones of many lawyers pretty efficiently (trust me, I've had to). To a scraper, though, it is a bunch of incoherent characters.

Early on, we noticed that this would cause problems in our system. So, we put in safeguards against it and against other similar nonsense text issues. And now, we don't have that problem anymore.

Or, another example is that the scraper would fail to read PDFs. Eventually, we added in a few lines that let’s it handle PDFs & the text therein just fine.

I wouldn’t have thought of these things or the dozens of other small & large tweaks we’ve made when we were just getting started.

But, 5% here and 1% there add up overtime and help a number like 50% grow towards 99%.

Now, the most recent change I’ve made was so small that it only yielded a 0.1% improvement! This happens when you take care of the biggest, most impactful changes first.

*We have a lot of things that could be called scrapers. Here, I am referring to our large scale website crawler, Shelob.

[Augment, Don’t Replace]

To be clear, as a fallback, our scrapers use other scrapers.

In the case of the above scraper, under certain conditions, we’ll send an api call to another service provider. This happens less than 10% of the time but is an incredible boost to our success rate.

The nuance to point out here is that we decided what sites to do this on after trying to handle it ourselves, and realizing where solves would require “too much” computational resources & complexities to work.

In other words, there is still some line we won’t cross where even we don’t think it’s worth it to fully do it ourselves right now.

But, using this as a fallback, not the primary method, makes it more or less a feature of our own scraper rather than a replacement for it. In other words, incorporating the third party was treated as an iterative part of the engineer process, not an end to end solution in itself.

[A Boat In Water Shows its Leaks]

As you compound and iterate on these designs, you're also compounding and iterating on your own understanding of them as well.

Yesterday, I did one of the most complex refactors I have ever done in my life. It is crazy to say, but going into it, I literally felt the same as I do before a competitive jiu jitsu match or a consequential negotiation.

The problem I was solving with this refactor is a scaling issue for our system (we are getting customers too quickly). One of my mentors has taught me that this is what you call a champagne problem.

The best textbook on training neural nets basically says it’s trial and error

I had tried solving it about a month ago as it was just starting to appear from the fog, but my solution was sub optimal. And as of yesterday, I estimate we had 2-3 weeks before it would negatively impact users.

The crazy part is, I did not know the exact solution… until yesterday morning!

I was pacing around thinking about it, and finally, the answer was quite obvious.

To implement, it took 490 minutes of focused work, and I estimate I have 90-120 minutes of spillover & cleanup today.

The thing is, if another very competent engineer were to do this, even if I gave them the rough solution, it would've taken them at least a week of time.

That does not mean I am 7x faster than they are--it just means I know the BirdDog codebase like the back of my hand. If I were to do a refactor of the same complexity on another codebase that I was not intimately familiar with, it would take me at least a week to do it myself, probably 2 or 3.

I have been in this one boat for the last year--I have seen it under pressure and know where the water likes to come in. So, I’m uniquely positioned to patch those leaks, just as you will be once you work on one thing for a long time.

All of this is to say that not only does your product get better through aggressive iterations, but you know your product better, too. And that matters.

Thanks for the read—if you enjoyed it, give it a subscribe; I’ve been around for 116 weeks and don’t plan on leaving anytime soon.

[Just Start]

When you're going to start building anything interesting, you'll be told that you shouldn't build it.

Initially, you might wonder if they're right--after all, your first drafts will be quite bad.

But, if you keep building the same thing, it will keep compounding. Your understanding of it will compound, too.

And, one day it might just compound so much that you’re the one who’s right.

Live Deeply,