Part 2: The Living Checklist: When Gawande Meets Karpathy’s Software 3.0

Written by Steve Hackbarth | August 5, 2025 6:02:29 PM Z

Atul Gawande made a deceptively simple argument in The Checklist Manifesto: avoidable failures are common not because we lack knowledge, but because we fail to apply what we already know — consistently, correctly, and at the right time. The humble checklist, he argued, offers a lightweight way to encode and enforce best practices, even among experts.

Andrej Karpathy, in his “Software 3.0” framing, makes a similarly powerful claim: natural language is no longer just a medium for documentation — it’s now a programming interface. In this world, the line between describing a task and executing it starts to blur.

At Ursa Health, we’re fusing these two perspectives to rethink how we manage one of the thorniest parts of our work: the integration of healthcare data. The result is what we call the living checklist.

From Checklist to Executable Protocol

In Gawande’s world, a checklist might say: confirm that the correct leg is marked before surgery. Gawande calls this kind of check “stupid but critical.” The point isn’t to remind someone of what they don’t know — it’s to create structure around what they already do.

In our world, a checklist item might say: confirm that the claim ID field uniquely identifies each claim. Also stupid, but also critical.

The traditional fix is to write documentation and try to get employees to keep their relevant training top-of-mind. Gawande’s checklist helps bring that training directly into a workflow. What LLMs — and the eager intern LLM in particular — make possible is to go even further to fuse these best practices directly into our engineering work.

Where Gawande's checklists were static, ours are dynamic. Each item in the checklist is an English-language prompt that triggers a tightly scoped analysis. The AI writes the SQL, runs the query, and presents results with justifications. The engineer doesn't just get a green checkmark — they get an assertion, with evidence, ready for scrutiny.

Capturing What the Senior Engineer Knows

Data integration work is necessarily artisanal because the quirks that manifest themselves in one project might not be the same in the next, even if it’s the same type of file from the same health plan. Institutional knowledge sits in Slack threads, archived emails, and the heads of the people who’ve been burned before.

Checklists are an elegant way to pull that knowledge out of people’s heads and make it durable. The living checklist allows us to take the next step and make it operational. Instead of asking, “Did you remember to check whether the claim type code harmonizes with other claim type indicators?” we encode that knowledge as a series of checks: “Check presence and population of place_of_service codes (indicative of Professional)”, “Check for presence of revenue codes (indicative of Institutional)”, “Verify harmonization between different claim type indicators if multiple exist.”

When we encounter a new failure mode, we update the checklist, ensuring the system evolves and improves with every insight. The next time, that nuance doesn’t rely on someone’s memory; it’s baked into the prompt.

Building a Feedback Loop That Gets Smarter

What makes this setup even more powerful isn’t just that we’ve automated checklist execution — it’s that the checklist can learn. This is the promise of Karpathy’s Software 3.0. If, as Karpathy puts it, “the hottest new programming language is English,” then we can blur the line between the engineer and the software that engineer is using.

Each time the AI surfaces something unexpected, the engineer can refine the prompt, add an exception, or expand the criteria. These changes are immediately readable by anyone on the team. We don’t need to write a new script or deploy new code — we just revise the English.

The result is that the software can be more responsive, iteration cycles can be immediate, and the engineers can be empowered to build their best practices directly into their tooling.

Why It Works

Gawande showed that checklists work not by dumbing down the work, but by letting experts focus on what actually requires expertise. Our goal is the same.

The engineer no longer has to check each date field for unrealistic (looking at you, 1753-01-01) values. That check now runs automatically every time, with the results reported in such a way that they don’t require a deep dive unless something is wrong. The living checklist allows the engineer to keep their cadre of eager interns aligned and on point.

Every data integration teaches us something. Every lesson becomes a prompt. Every prompt is a testable assertion. Every assertion waits for human signoff. The result is a system where hard-won knowledge doesn’t fade — it compounds. In the final piece of this series, I’ll describe how we’re turning both the eager intern and living checklist philosophies into software through Ursa Compass.

About the Author

Steve Hackbarth, Chief Technology Officer

Before joining Ursa Health, Steve was head of development at xTuple, the world's #1 open-source ERP software company, managing a diverse development team that spanned four continents. Before that, he founded Speak Logistics, a tech startup borne out of his experience in the transportation industry, which introduced the user-friendly and modern sensibilities of the consumer Internet into the enterprise space. His professional passions include JavaScript, open-source software, continuous integration, and scalable code.

Steve holds a bachelor's degree in computer science from Harvard University and an MBA from William and Mary. He is a frequent speaker on subjects such as JavaScript, modular architecture, git, open source, and asynchronous programming.

View full post