← Case Study Home

Things That
Broke
(And What They Cost)

What went wrong, what it cost, and what systems were built to prevent it from happening again.

0
Failures documented
0
Hours lost
0
Users lost
0
Reviews built in response

Part of the ForkIt! Case Study. Read the full story →

Scroll to explore

The Code Failures

Bugs that shipped. Builds that broke. Things that should have been caught before anyone saw them.

The iOS Rejection

What happened

Apple rejected the v2 submission for missing subscription compliance. Auto-renewal disclosure text was missing. There was no Restore Purchases button. Terms of Service and Privacy Policy links were not accessible from the paywall screen. No LLM tutorial, no human tutorial, no documentation of any kind mentioned these requirements.

What it cost

Two days of rework. A full resubmission cycle. The realization that the LLM had confidently built an IAP flow that would never pass review.

What fixed it

Added all three compliance elements. Created a pre-submission checklist. The 31-review suite now includes Review 16 (IAP/Subscription Compliance) that checks for every requirement Apple and Google enforce.

The Refactor That Broke Production

What happened

During a code cleanup, Claude renamed API endpoints for consistency. Clean code. Better naming. Deployed to Vercel. The problem: every app already installed on users' phones was still calling the old endpoint names. Live users started getting errors. The old URLs returned 404s.

What it cost

Live users hit errors until backward-compatibility rewrites were added. Those rewrites still exist in the Vercel config today as tech debt (#115).

What fixed it

Added rewrites in vercel.json to map old paths to new endpoints. Established a rule: never rename a live API endpoint without a redirect. The review suite now includes Review 21 (API Endpoint Hygiene) and Review 30 (Migration and Upgrade Safety).

ANR on Android (App Not Responding)

What happened

Android users with non-Chrome default browsers (DuckDuckGo, Firefox) experienced hangs during Clerk SSO sign-in. The app froze completely. Chrome Custom Tabs opened in the wrong browser, which couldn't handle the redirect back. The main thread blocked. Android killed the app.

What it cost

Hours of debugging a platform-level limitation that has no real fix. Users with non-Chrome defaults still see friction.

What fixed it

Added a hint toast on Android sign-in failure explaining the Chrome requirement. Documented it as a known limitation in the project guide. This is a platform constraint, not something app code can solve. The fix was transparency, not engineering.

The Demo

The Demo Failure

The stakeholder engagement was right. The preparation was wrong.

The Annapolis Demo

What happened

Work trip to Annapolis. Three colleagues who had heard about the app wanted to try it. Perfect opportunity: real users, real context, at a restaurant, ready to pick a place for dinner. Opened the app. It didn't work.

The backend had been updated during development without a matching app build. The dev build on the phone was calling endpoints that had changed. The app couldn't fetch restaurants. At a restaurant. In front of three potential users.

What it cost

Three users. Not hypothetical users. Three real people who were interested, present, and ready to try it. They watched it fail and moved on. They never came back to it.

Every demo is a stakeholder touchpoint. This one failed because the build wasn't tested on the device, in the environment, within the hour. The engagement instinct was right. The preparation discipline didn't exist yet.
What fixed it

Demo prep protocol: before any stakeholder touchpoint, verify the build is production (not dev), test on the actual device, confirm the backend matches, and have a fallback plan if it crashes. This became part of the project's pre-push checklist.

The LLM

The LLM Failures

This is the centerpiece. Three failures with the same root cause: the LLM gave a confident answer, I trusted it, and the answer was wrong. Each time, the cost compounded because the wrong answer became the basis for the next decision.

The Build Credit Burn

What happened

Asked Claude how many EAS builds were included on the Starter plan. Answer: "Unlimited!" Planned accordingly. Weeks later, asked again. Claude read its own previous memory file. Same answer: "Unlimited!" Felt confirmed.

Then an email arrived from Expo: 80% of build credits consumed with two weeks left in the billing cycle.

The LLM had been wrong the first time. Then it cited itself as a source the second time. A hallucination, reinforced by its own memory, presented as verified information.

What it cost

Scrambled to cancel unnecessary builds. Established a rule to cancel superseded builds immediately. Nearly burned through a paid plan's allocation on builds that were never used.

What fixed it

New rule: never trust LLM pricing knowledge. Always verify vendor pricing pages directly before making cost-driven decisions. Added to project memory as a permanent instruction.

The Google API Pricing Lie

What happened

Claude stated that Google Maps Platform included a $200/month free credit. Architecture decisions, cost projections, and the free tier design were all built on this assumption. The entire "as free as possible" pricing model assumed that credit existed.

It didn't. Google had eliminated the $200 monthly credit over a year before the project started. The LLM's training data was stale. Every cost calculation based on that number was wrong.

What it cost

Rearchitecting the API call strategy. Adding pool caching, client-side filtering, and aggressive deduplication to reduce API spend. The free tier limits (20 searches/month) exist partly because the expected credit never materialized.

What fixed it

Verified Google's actual pricing page. Rebuilt cost projections from real numbers. Added "verify vendor pricing" as a permanent memory instruction. Every cost-driven decision now requires a primary source, not an LLM assertion.

The Billing Exposure

What happened

The backend API had no rate limiting or origin checking for weeks. No user data was at risk (the search endpoint doesn't store or transmit personal data), but anyone who discovered the URL could have run up the Google Places API bill on my account.

The LLM built it. The LLM didn't flag the gap. I didn't know to check. It was discovered during a code review, not because anyone exploited it.

What it cost

Weeks of financial exposure. Nobody found it. The risk was to my billing account, not to user data.

What fixed it

Added rate limiting (30 req/min per IP), origin checking, and security middleware. The review suite includes Review 7 (Security), Review 18 (Operational Readiness), and Review 21 (API Endpoint Hygiene). Every endpoint now validates before processing.

The Auth Gap

What happened

Separate from the billing exposure: API endpoints that touch user data (favorites, history, sync) accepted a user ID in the request body but never verified it belonged to the person making the request. Someone could have read or written another user's data by guessing their ID.

What it cost

Discovered during a manual code review session. Patched immediately. No exploitation occurred. But the gap was real: user data endpoints without identity verification.

What fixed it

Added Clerk JWT verification on all protected endpoints. Established Review 19 (Auth and Identity) in the review suite. Auth is now verified server-side on every request that touches user data.

The Cross-Project Contamination

What happened

Claude picked up stray files from a different project (HabitCoach, a blue and gold themed app) and built the entire onboarding tour in the wrong brand colors. The tour was fully functional, well-structured, and completely off-brand. Blue and gold instead of orange and teal.

What it cost

Multiple design reviews missed it. I didn't catch it until later, when the gold "felt off." The LLM cross-contaminated two projects, and the human reviewing it didn't notice because the tour worked. Functional correctness masked visual wrongness.

What fixed it

Rebuilt the tour in the correct brand. Established the color theory doc (orange = problem, teal = solution) and the brand style guide in CLAUDE.md so the LLM has an authoritative reference. Review 22 (Brand Continuity) now checks every screen for theme consistency.

The pattern across all five: the LLM gave a confident output. I had no framework to evaluate it. The output became infrastructure. When it turned out to be wrong, the fix was expensive because decisions had been built on top of it.
The Process

The Process Failures

Not code bugs. Judgment bugs. Decisions that cost time because the framework for evaluating them didn't exist yet.

The Mom Group Trap

What happened

Weeks of design work on features driven by feedback from a mom group on social media. The feedback was enthusiastic. The problem: the people giving it weren't users. They were repeating what sounded interesting, filtered through their own context, with no connection to the actual use case.

The features they described (social check-ins, recipe integration, gamification) sounded reasonable in isolation. They were completely wrong for the product. The app picks a random restaurant. It doesn't need a social feed.

What it cost

Weeks of design iteration on the wrong mental model. Time that could have been spent on features actual users needed (like the exclude filter, or walk mode).

What fixed it

Built a feedback taxonomy with five types (behavioral, voiced pain, feature requests, ambient noise, market signal). Mom group feedback falls under "ambient noise," which gets the lowest weight for feature direction. The signal-to-noise rubric now requires decomposing every request into the underlying need before evaluation.

The Promo Code Gap

What happened

Built a promo code system for early adopters and testers. Had codes ready to distribute. The problem: the highest-value testers were already gone. The hostel strangers who gave the exclude-filter feedback, the walker who surfaced the walk-mode gap, the colleague who couldn't demo to a friend. No contact information was captured during those interactions.

What it cost

Lost the ability to close the loop with the people who shaped the product most. Could not reward the testers who mattered. Could not get follow-up feedback from the users whose input was most valuable.

What fixed it

Created STAKEHOLDERS.md in the repo root: a lightweight register tracking who gave feedback, what channel, what type, what it became, and whether the loop was closed. Includes a capture checklist for future sessions. Not a formal tool; just a file that gets updated. The hostel testers are still lost, but the next round won't be.

The Pattern

When Each Failure Happened

Early failures were code. Middle failures were LLM trust. Late failures were process. The progression tells the real story: what you don't know shifts as you learn.

v1.0 Launch
Open Backend
No auth, no rate limiting. Weeks of exposure. Nobody found it.
v1.0 Launch
Google API Pricing Lie
$200/month credit didn't exist. Cost model built on false data.
v2.0 Development
iOS Rejection
Missing subscription compliance. Two-day rework.
v2.0 Development
Build Credit Burn
LLM said "unlimited." Then cited itself. Email said 80% used.
v2.0 Development
Mom Group Trap
Weeks of design on features from non-users.
v3.0 Development
Refactor Break
Renamed endpoints. Live apps still called old names. Errors.
v3.0 Development
Security Exploit
Endpoints accepted any user ID without verification.
v3.0 Development
ANR on Android
Clerk SSO hung on non-Chrome browsers. Platform limitation.
v3.0 Testing
Annapolis Demo
App broke at a restaurant. Three users, gone.
v3.0 Testing
Promo Code Gap
Codes built. Best testers already gone. No contact info captured.
Post v3.0
31-Review Suite Built
Every failure above now has a corresponding review that catches it.
The Response

What Got Built in Response

For every category of failure, a system was built to prevent recurrence. The failures are orange. The responses are teal.

Code Failures

31-review suite with automated checks (ESLint, Prettier, Secretlint, Knip, npm audit) plus 26 manual deep-dive reviews. Runs before every deploy.

  • Review 16: IAP/Subscription Compliance
  • Review 21: API Endpoint Hygiene
  • Review 30: Migration and Upgrade Safety

Demo Failures

Demo prep protocol. Before any stakeholder touchpoint: verify the build is production, test on the actual device, confirm the backend matches, have a fallback.

  • Pre-push impact check (all platforms)
  • Never push untested builds
  • Verify on physical device via USB

LLM Failures

Verify-before-trust protocol. Never trust LLM assertions about pricing, platform limits, or compliance requirements. Always check the primary source.

  • Verify vendor pricing pages directly
  • Cancel superseded builds immediately
  • Security review on every endpoint
  • Review 7: Security (hardcoded secrets, eval, injection)

Process Failures

Feedback taxonomy and stakeholder map. Classify every piece of feedback before acting on it. Capture contact info during engagement.

  • Five feedback types with weighted signals
  • Signal-to-noise rubric for requests
  • Lightweight stakeholder register
  • Loop-closing checklist

The Common Thread

Most of these failures would have been caught in five minutes by someone with the right experience. The iOS rejection, the billing exposure, the endpoint rename, the pricing assumption. None of them were hard problems. They were gaps in knowledge that a mentor, a code reviewer, or a senior engineer would have flagged instantly.

A solo builder doesn't have that person. An LLM is not that person. The LLM will confidently fill the gap with wrong answers that feel right.

The fix was not "be smarter." The fix was building systems that catch what I can't see. The 31-review suite, the verify-before-trust rule, the demo prep protocol. These are the scar tissue. They exist because something broke.