Production-ready is a feeling, not a checkbox

Tests green and types passing aren't enough. Production-ready is the gut feeling you get after running the thing yourself on a clean box.

I shipped the same feature three times last month before it was actually shipped.

First ship: tests green, types passing, PR merged, deploy button clicked, Slack message sent. Classic. Two hours later a user pinged us because the timestamps on their dashboard were eight hours off. Turns out the worker process ran in UTC but the formatter inherited the container's TZ env var, which a teammate had set to America/Los_Angeles in a Dockerfile six months ago for a totally different reason. Second ship: fixed the timezone, redeployed, felt great for about forty minutes. Then the queue worker started silently dying every time a webhook sent us a payload with a null in a field we assumed was always a string. No alerts, because the worker was crashing cleanly and the supervisor was happily restarting it. We only noticed because the dashboards stopped updating. Third ship: defensive parsing, retries, a dead-letter queue, an alert wired to the restart count. That one actually held.

So I have been thinking about what it means to call something production-ready, and I keep coming back to this: it is not a checkbox. It is a feeling.

The gap between green and live

Tests passing tells you the code does what you told it to do. That is a useful thing. It is also a small thing. The list of stuff tests do not catch is long and embarrassing once you start writing it down:

Timezone bugs that only show up on a server in a different region than your laptop.
The queue worker that handles the happy path beautifully and then explodes on a single malformed message and takes the whole pipeline with it.
The migration that runs in 200ms on your local 50-row table and locks production for 90 seconds because it forgot to add the index concurrently.
The env var nobody added to the deploy script, so the staging release worked and prod fell back to a default that points at the wrong S3 bucket.
The dependency that pulls a different minor version in CI versus prod because someone forgot to commit the lockfile.
The cron job that runs at 3am when you are asleep and the team is asleep and nobody finds out until customers do.

None of these show up in a unit test. Most of them do not even show up in an integration test. They show up when the code meets a real environment with real users at a real time of day. And the only way to feel ready for that is to put yourself in something close to that environment before you ship.

The small ritual

Here is what I do now, and it has saved me from a lot of late-night panic:

Run it on a fresh box. Not your laptop, where every tool is already installed and every env var is already set. A clean VM, a fresh container, a teammate's machine, whatever. If the README does not get you from zero to running in fifteen minutes, the README is wrong and so is your deploy.
Watch the logs for ten minutes. Not five. Ten. Boring minutes. With nothing happening. Just watch what the app does when it is idle. You will be surprised how much chatter, how many warnings, how many retries are hiding in a healthy-looking system.
Kill it and restart it. Does it come back cleanly? Does it lose in-flight work? Does it double-process anything? You will learn more in this one step than in a week of code review.
Fill out the runbook before you merge. Not after. Before. If you cannot explain in three bullet points how to roll this back, what to check if it breaks, and who owns it when you are on vacation, it is not ready.

None of this is exotic. It is just slow. And slow is the opposite of how most of us want to ship.

The vibecoder's trap

If you build with AI tools the way I do, there is a specific trap waiting for you. The models are excellent at writing the happy path. Ask for a function that processes a webhook and you will get a beautiful, idiomatic, tested-on-the-good-case function in fifteen seconds. Ask the same model what happens if the payload is empty, or the upstream service is slow, or the JSON is technically valid but semantically nonsense, and you will often get a polite shrug dressed up as a try/except block.

The happy path is the easy 60 percent. The other 40 is what determines whether you sleep tonight. AI tools, in my experience, will write that 40 only if you specifically ask for it, by name, with examples. Otherwise they will give you a function that looks production-ready and behaves like a demo.

So I keep a little one-pager. It lives in a note on my desktop and I open it before any deploy that matters. It asks me boring questions like: what does this do if the input is empty? What does it do if the upstream is down? What does it do if it is called twice? Where does it log? Who gets paged? How do I roll it back? I do not always have great answers. But asking the questions is the work. The questions are the ritual. The ritual is what turns a green build into something I would actually defend.

The honest test

The way I check whether something is really production-ready is simple. After I deploy, would I be comfortable closing my laptop and taking the rest of the day off?

If the answer is yes, it is ready. If the answer is some version of "probably, but let me just keep an eye on it for a few hours," it is not ready. That hedge is your gut telling you something the checklist did not catch. Listen to it.

Tests green is a start. The feeling is the finish.

Production-ready is a feeling, not a checkbox

The gap between green and live

The small ritual

The vibecoder's trap

The honest test

Want more like this?