From prototype to production: 5 things every AI-built MVP misses

You shipped your MVP. Sign-ups are coming in. The first paying customer wired you money. Then someone uploads a 4 MB CSV, the API returns a 500, a Stripe webhook is missed, and you spend the next six hours figuring out what happened — and whether you owe a refund.

This is the gap that almost no AI-built MVP closes on its own: the gap between shipped and ready.

We see it constantly. The AI tools you used to build the thing are optimised for generating code that works in the happy path. They're not optimised for the boring 20% of the codebase that decides whether you make it past your first ten paying customers. That 20% almost never gets generated, because the prompts that ask for it don't exist in the training data — nobody asks an AI assistant "give me a request-cancellation-aware queue worker with idempotency keys," because the people who know to ask for that already know how to write it.

Here are the five things we run through on every architecture review for an AI-built app. None of them are exotic. All of them are the difference between an app that survives the next 90 days and an app that quietly dies of a thousand papercuts.

1. Error handling that reaches further than the happy path

The shape of an AI-generated function is reliably this: do the work, return the result, wrap the whole thing in a try/catch that logs and re-throws. That's fine. What's missing is the what next.

When your Stripe webhook handler fails, what happens? When your file-upload endpoint times out at 28 seconds because the model is rate-limiting you, does the user see a clear "try again in a minute," or do they see a 504 from the edge? When your database connection pool exhausts at 9 pm, does your dashboard tell you, or do you find out when the founder of your second-biggest customer DMs you?

What to add:

An error-classification layer that distinguishes user errors (4xx, show a clear message), retryable system errors (5xx, retry with exponential backoff), and unrecoverable errors (alert immediately, surface to user with a graceful fallback).
Idempotency keys on every write endpoint that touches money, mutations, or external APIs. AI codegen almost never adds these by default.
A real timeout budget per request, ideally with a hard cap that's lower than your hosting platform's hard cap, so you can return a graceful response instead of getting killed mid-write.

This isn't a refactor. It's a pass through your top ten endpoints — usually a day's work for a senior, and it's the single highest-leverage hardening change you can make.

2. Observability you can use at 11 pm

When something breaks, you have two tools: logs and metrics. Most AI-built MVPs have neither in a useful state.

The default output is console.log (or print) statements scattered through the code, written into stdout, captured by whatever platform you're hosting on, and effectively unsearchable. When your customer says "the booking failed at 4:13 pm," you can't find the request. You don't know which user. You don't know what the input was.

What to add:

Structured logging at every API boundary. Each log line is JSON with at least: timestamp, request ID, user ID (if authenticated), endpoint, and the relevant business identifiers (booking ID, file ID, tenant ID).
A request ID that flows through the whole stack — generated at the edge, passed in headers, included in every log line and every database query comment. This is the single thing you'll thank yourself for the next time something breaks.
One alert that pages a real human, set to an SLO you've actually thought about (e.g. "p95 latency on /book greater than 2 seconds for 5 minutes"). Until you have one alert, you have no SLO.

This is half a day of work. It is also the difference between debugging in 4 minutes versus 4 hours.

3. Auth, secrets, and the trust boundary

This is where the most common — and most embarrassing — failures live. We've seen production deployments with the OpenAI API key in the client bundle. We've seen "magic link" auth flows where the email link was the literal user ID. We've seen admin endpoints protected by a checkbox in the request body.

The pattern: AI tools are good at generating what looks like auth. They wire up Clerk, Auth0, Supabase Auth — that part is usually fine. What they miss is the boundary between authenticated and authorised.

Authentication says "this person logged in." Authorisation says "this person can do this specific thing." AI codegen will reliably generate the first and very unreliably generate the second. The result is endpoints that check "is logged in," not "owns this resource."

What to add:

A check at the start of every mutation endpoint that the requesting user owns the resource being mutated. Yes, every one. Yes, it feels repetitive. There's no clever abstraction that's worth a leaked customer's data.
All secrets out of the client bundle and into server-side environment variables. If you're on Vercel, run vercel env pull and audit what's there.
One round of re-keying any secret that has ever been pasted into an AI tool, written into a .env.example, or committed to git history. Yes, even if you "deleted" it.

Two days of work for a small app. Non-negotiable.

4. The database is the part that survives

Your app code can be regenerated tomorrow. Your data can't. And yet the database is consistently the part of an AI-built app that gets the least attention.

The patterns we see:

No database backups, or backups configured but never tested.
No migrations — schemas modified by hand in production, no record of what changed when.
No constraints — every column nullable, no foreign keys, no unique indexes — leading to slow corruption (the row that "shouldn't exist").
No transactions wrapping multi-step writes; half-completed states all over the place.
No tenant isolation in a multi-tenant app, where one customer's row is one bug away from being visible to another.

What to add:

Automated daily backups, plus a recovery test — restore one yesterday into a staging environment and verify it works. A backup you've never restored is not a backup.
A migration tool — Prisma, Drizzle, Supabase migrations, whatever fits the stack. Every schema change goes through it. No more manual ALTER TABLE in the production console.
The four constraints worth adding immediately: NOT NULL on anything that should never be null, UNIQUE on identifiers, FOREIGN KEY on relationships, and a CHECK on enums.
For multi-tenant apps: row-level security (Postgres RLS, or equivalent) so the rule is enforced at the database, not in the app code.

Half a day for the migration tool, half a day for the constraints, ongoing discipline for the rest.

5. Deploys you can roll back

The deploy pipeline is the production app's nervous system. If you can't ship a fix in five minutes when something breaks, the breakage compounds.

Most AI-built MVPs are deployed by running git push and letting Vercel, Render, or Railway pick it up. That's fine for the happy path. What's missing:

A way to roll back fast. "Promote previous deployment" on Vercel is the right pattern, but you have to know it exists and have practised it.
A staging environment that runs the same build pipeline as production, with realistic data shapes. Without this, every deploy is a roll of dice.
Database migrations run as a deploy step, not a manual thing the founder remembers to do at 2 am.

What to add:

One staging environment that auto-deploys from a staging branch. Anything that goes to production goes through here first, even if the smoke test is a 30-second click-through.
Practised rollback. Roll a deployment back at least once, on purpose, in staging, before you need to do it in production.
Database migrations gated behind a deploy step that fails the deploy if they fail.

Half a day to set up. Existential when something breaks.

What the production-readiness pass actually looks like

If you read all five and thought "we have most of this," congratulations — you're in the top 20% of AI-built MVPs we audit.

If you read all five and thought "we have none of this," that's also normal, and not a moral failing. The tools you built with optimise for the demo, not the on-call. Closing this gap is most of what a senior engineer earns their fee for. It's also why AI hasn't replaced senior engineers, even though it's done a fair impression of replacing junior throughput — the work above isn't typing, it's judgement about what to type and when.

A typical production-readiness pass we run at Think and Form is one or two weeks. The deliverable is a written report with each gap, the fix, the priority, and a one-line rationale. We do the highest-priority items ourselves; the rest go into a roadmap your team can work through.

If you're staring at an AI-built MVP and wondering whether you're ready for your first hundred customers, that's the conversation we have on a discovery call. Reach out at admin@thinkandform.co.nz.