Friday night, 11 PM. I'm watching a friend demo a little internal tool he vibecoded over the weekend. The chat transcript is gorgeous. The agent wrote the schema, scaffolded the API, generated the dashboard, even shipped a deploy script. He clicks through the UI on his laptop and everything just works. We toast it. Ship it.
Monday morning the Slack ping arrives. The app is down. Tuesday it's down again, for a different reason. By Wednesday he's reverted to a static HTML page with a Calendly link.
I love this story because I have lived it five times this year. The agent didn't lie. The code really did run. It ran on Claude's machine. 🟠
Why the demo fools you
An LLM-built app passes its first demo for the same reason a movie set passes for a real street. From the camera angle the agent picked, everything looks load-bearing. Step three feet to the side and you see the plywood.
The agent is optimizing for the happy path it just generated. It hardcoded the path that worked. It imported the function it imagined existed. It wrote the test that mocks the thing it wasn't sure about. None of this is malice. It's just what happens when the model is rewarded for "the screenshot looks right" instead of "the system survives a Tuesday."
Five failure modes I keep seeing
1. The .env that only existed in the agent's head
The chat transcript has OPENAI_API_KEY=sk-... sitting in a code fence as an example. The code reads process.env.OPENAI_API_KEY. On the agent's sandbox the env was injected. On your machine it's undefined and the error is swallowed by a try/catch the agent added "for safety." You get a silent 200 with an empty response.
2. The import that worked because the model imagined the API
I saw an agent confidently write import { createServerClient } from '@supabase/ssr' against a version of the package that didn't export that function yet. The install succeeded. The import resolved to undefined. The call site crashed at request time, not at boot. Hallucinated package versions are the single most common Tuesday-morning bug in my inbox.
3. The migration that "ran" but didn't actually create the column
The agent generates a migration file. It also runs db migrate and reports success. What actually happened: the migration runner exited 0 because it had nothing to apply, because the file was created in a directory the runner doesn't scan. The schema is unchanged. Reads work because the ORM is lenient. Writes blow up the moment a real user tries.
4. The test that mocks the function and then asserts on the mock
This one hurts. The test imports sendEmail, replaces it with jest.fn().mockResolvedValue({ id: 'mock-123' }), calls the route handler, and asserts the response contains mock-123. The test is green. The test is also testing nothing except that Jest can return a value. The real sendEmail has never been called by anyone, including production.
5. The deploy script with a relative path that only resolves from the project root
The script does cp ./config/prod.env ./dist/.env and works perfectly in the chat sandbox because the cwd was always the repo root. On the CI runner the cwd is one level up. The copy silently fails because cp doesn't care. The container boots without an env file. The app starts. It just can't talk to the database.
The week-three checklist
Here is what I do now before I let any vibecoded thing near a real user. It takes about 90 minutes. It has saved me dozens of Tuesdays.
- Re-clone from the remote into a fresh directory. Not
git pull. A real clone. The agent may have left files on disk that are not in the repo. - Install on a clean machine. A fresh container, a borrowed laptop, anything that has never seen this project.
npm ci, notnpm install. Lockfile or it didn't happen. - Read every
process.env.*reference. Grep for it. For each one, ask: where is this set in prod? If the answer is "the agent told me to add it," add it now and write it down. - Run the actual production command. Not
npm run dev. The thing that runs in your container. Watch the boot logs. If anything says "warning" or "deprecated," read it. - Open one test file and read it like a code reviewer. If the test mocks the function it is meant to test, delete the test. It is lying to you.
- Run the migration against a fresh database. Not your dev DB with state. A new one. Then connect with
psqland\dthe tables. Eyes on the columns. - Hit one real endpoint with curl. No browser, no Postman collection the agent generated. Curl, with the actual auth header you'd use in prod.
Half of these will find something. The other half will give you the calm of knowing they didn't.
The sketch and the building
I am not anti-agent. I shipped three things this quarter where the first 70% came out of a chat window and I would not have started them otherwise. The agent is genuinely good at the sketch. The proportions are right, the rooms are in roughly the right places, the front door faces the street.
But the LLM gave you a sketch. You ship the building. The plumbing, the load-bearing walls, the thing that has to stand up when it rains on a Tuesday, that is still your job. The demo passing is the start of the work, not the end of it.
If your vibecoded app survives week three on a machine that isn't yours, you've built something. If it only ever ran in the chat, you've got a very pretty screenshot.