← Case Study Home

Things That
Broke
(And What They Cost)

What went wrong, what it cost, and what systems were built to prevent it from happening again.

0
Failures documented
0
Hours lost
0
Users lost
0
Reviews built in response

Part of the ForkIt! Case Study. Back to the case study →

Scroll to explore

The Code Failures

Bugs that shipped. Builds that broke. Things that should have been caught before anyone saw them.

The iOS Rejection

What happened

Apple rejected the v2 submission for missing subscription compliance. Auto-renewal disclosure text was missing. There was no Restore Purchases button. Terms of Service and Privacy Policy links were not accessible from the paywall screen. No LLM tutorial, no human tutorial, no documentation of any kind mentioned these requirements.

What it cost

Two days of rework. A full resubmission cycle. The realization that the LLM had confidently built an IAP flow that would never pass review.

What fixed it

Added all three compliance elements. Created a pre-submission checklist. The 31-review suite now includes Review 16 (IAP/Subscription Compliance) that checks for every requirement Apple and Google enforce.

The Refactor That Broke Production

What happened

During a code cleanup, Claude renamed API endpoints for consistency. Clean code. Better naming. Deployed to Vercel. The problem: every app already installed on users' phones was still calling the old endpoint names. Live users started getting errors. The old URLs returned 404s.

What it cost

Live users hit errors until backward-compatibility rewrites were added. Those rewrites still exist in the Vercel config today as tech debt.

What fixed it

Added rewrites in vercel.json to map old paths to new endpoints. Established a rule: never rename a live API endpoint without a redirect. The review suite now includes Review 21 (API Endpoint Hygiene) and Review 30 (Migration and Upgrade Safety).

The Dependabot Trap (RN/SDK Mismatch)

What happened

A "minor-and-patch group" Dependabot PR included react-native: 0.83.2 → 0.85.1 alongside 28 other safe-looking bumps. Verification on expo run:ios --device passed because that path is lenient and skips strict codegen. The mismatch only surfaced when expo export:embed (the eager bundle path) ran, which broke BOTH eas update and eas build --local. SDK 55's babel-preset-expo@55.0.17 ships @react-native/codegen@0.83.4, which can't parse RN 0.85.1's VirtualViewNativeComponent onModeChange event. RN minor bumps inside an Expo SDK major aren't safe; the constraint isn't visible to Dependabot.

What it cost

OTA pipeline blocked for a session. Reverted to the prior RN minor.

What fixed it

Pinned RN to the SDK-supported minor and added npx expo install --check as a custom CI step. Lesson: different toolchain paths exercise different code paths. A passing verification on one path can hide a different path's break.

The OTA Orphan Trap

What happened

The project uses runtimeVersion.policy: "appVersion", so the binary version in app.json IS the OTA runtime version. Bumping app.json version to mark a new OTA release feels like a clean version bump. It actually orphans every existing OTA bundle: the new value doesn't match any deployed binary's runtime, so the OTA never reaches a user. Discovered when an OTA published successfully and never showed up on devices.

What it cost

One published OTA that reached zero users.

What fixed it

Two-version convention in the project guide. app.json version (binary, three-segment MAJOR.MINOR.PATCH) bumps only immediately before a new EAS build. constants/config.js APP_VERSION (header value, two-segment MAJOR.MINOR) bumps on every OTA. The lesson: when "what version is this?" has two correct answers, write down which one bumps when.

ANR on Android (App Not Responding)

What happened

Android users with non-Chrome default browsers (DuckDuckGo, Firefox) experienced hangs during Clerk SSO sign-in. The app froze completely. Chrome Custom Tabs opened in the wrong browser, which couldn't handle the redirect back. The main thread blocked. Android killed the app.

What it cost

Hours of debugging a platform-level limitation that has no real fix. Users with non-Chrome defaults still see friction.

What fixed it

Added a hint toast on Android sign-in failure explaining the Chrome requirement. Documented it as a known limitation in the project guide. This is a platform constraint, not something app code can solve. The fix was transparency, not engineering.

Postscript

V4 removed Clerk entirely. No sign-in, no SSO, no browser redirect. The problem was eliminated by removing the dependency, not by fixing it.

The Demo

The Demo Failure

The stakeholder engagement was right. The preparation was wrong.

The Annapolis Demo

What happened

Work trip to Annapolis. Three colleagues who had heard about the app wanted to try it. Perfect opportunity: real users, real context, at a restaurant, ready to pick a place for dinner. Opened the app. It didn't work.

The backend had been updated during development without a matching app build. The dev build on the phone was calling endpoints that had changed. The app couldn't fetch restaurants. At a restaurant. In front of three potential users.

What it cost

Three users. Not hypothetical users. Three real people who were interested, present, and ready to try it. They watched it fail and moved on. They never came back to it.

Every demo is a stakeholder touchpoint. This one failed because the build wasn't tested on the device, in the environment, within the hour. The engagement instinct was right. The preparation discipline didn't exist yet.
What fixed it

Demo prep protocol: before any stakeholder touchpoint, verify the build is production (not dev), test on the actual device, confirm the backend matches, and have a fallback plan if it crashes. This became part of the project's pre-push checklist.

The LLM

The LLM Failures

This is the centerpiece. Three failures with the same root cause: the LLM gave a confident answer, I trusted it, and the answer was wrong. Each time, the cost compounded because the wrong answer became the basis for the next decision.

The Build Credit Burn

What happened

Asked Claude how many EAS builds were included on the Starter plan. Answer: "Unlimited!" Planned accordingly. Weeks later, asked again. Claude read its own previous memory file. Same answer: "Unlimited!" Felt confirmed.

Then an email arrived from Expo: 80% of build credits consumed with two weeks left in the billing cycle.

The LLM had been wrong the first time. Then it cited itself as a source the second time. A hallucination, reinforced by its own memory, presented as verified information.

What it cost

Scrambled to cancel unnecessary builds. Established a rule to cancel superseded builds immediately. Nearly burned through a paid plan's allocation on builds that were never used.

What fixed it

New rule: never trust LLM pricing knowledge. Always verify vendor pricing pages directly before making cost-driven decisions. Added to project memory as a permanent instruction.

The Google API Pricing Lie

What happened

Claude stated that Google Maps Platform included a $200/month free credit. Architecture decisions, cost projections, and the free tier design were all built on this assumption. The entire "as free as possible" pricing model assumed that credit existed.

It didn't. Google had eliminated the $200 monthly credit over a year before the project started. The LLM's training data was stale. Every cost calculation based on that number was wrong.

What it cost

Rearchitecting the API call strategy. Adding pool caching, client-side filtering, and aggressive deduplication to reduce API spend. The free tier limits (20 searches/month) exist partly because the expected credit never materialized.

What fixed it

Verified Google's actual pricing page. Rebuilt cost projections from real numbers. Added "verify vendor pricing" as a permanent memory instruction. Every cost-driven decision now requires a primary source, not an LLM assertion.

The Billing Exposure

What happened

The backend API had no rate limiting or origin checking for weeks. No user data was at risk (the search endpoint doesn't store or transmit personal data), but anyone who discovered the URL could have run up the Google Places API bill on my account.

The LLM built it. The LLM didn't flag the gap. I didn't know to check. It was discovered during a code review, not because anyone exploited it.

What it cost

Weeks of financial exposure. Nobody found it. The risk was to my billing account, not to user data.

What fixed it

Added rate limiting (30 req/min per IP), origin checking, and security middleware. The review suite includes Review 7 (Security), Review 18 (Operational Readiness), and Review 21 (API Endpoint Hygiene). Every endpoint now validates before processing.

The Auth Gap

What happened

Separate from the billing exposure: API endpoints that touch user data (favorites, history, sync) accepted a user ID in the request body but never verified it belonged to the person making the request. Someone could have read or written another user's data by guessing their ID.

What it cost

Discovered during a manual code review session. Patched immediately. No exploitation occurred. But the gap was real: user data endpoints without identity verification.

What fixed it

Added Clerk JWT verification on all protected endpoints. Established Review 19 (Auth and Identity) in the review suite. Auth is now verified server-side on every request that touches user data.

Postscript

V4 removed all user data endpoints entirely (no accounts, no server-side history, no sync). The attack surface was eliminated, not patched. The auth gap can't exist if user data never leaves the device.

The Cross-Project Contamination

What happened

Claude picked up stray files from a different project (HabitCoach, a blue and gold themed app) and built the entire onboarding tour in the wrong brand colors. The tour was fully functional, well-structured, and completely off-brand. Blue and gold instead of orange and teal.

What it cost

Multiple design reviews missed it. I didn't catch it until later, when the gold "felt off." The LLM cross-contaminated two projects, and the human reviewing it didn't notice because the tour worked. Functional correctness masked visual wrongness.

What fixed it

Rebuilt the tour in the correct brand. Established the color theory doc (orange = problem, teal = solution) and the brand style guide in CLAUDE.md so the LLM has an authoritative reference. Review 22 (Brand Continuity) now checks every screen for theme consistency.

Premature Surrender

What happened

LLMs will say "I can't do that" or "you'll need to do this manually" about things they absolutely can do. The first time it happened, I accepted it and spent time on manual work that wasn't necessary. It kept happening. Each time, the cost was small: a few minutes here, a workaround there. But it accumulated.

What it cost

No dramatic failure. No outage. Just accumulated time lost to unnecessary manual work, spread across dozens of interactions. The kind of cost that doesn't show up in a postmortem because no single instance was big enough to flag.

What fixed it

A permanent instruction in the project memory: don't say "I can't" without trying workarounds first. Push back, ask to try, or rephrase. The lesson: don't accept the first "no" from an LLM. The default is surrender, not effort.

The What's New Ironic Gap

What happened

Built a What's New modal that shows release notes when users open the app on a new OTA version. Shipped in v4.2. Realized after publishing: the feature has no lastSeenVersion to compare against on the OTA that introduces it, so every existing user sees nothing. The modal that announces what's new can't announce itself.

What it cost

The launch was silent. Users on v4.2 got the feature but no announcement of it.

What got built

The modal becomes useful from the next OTA forward. Documented in code so the silence is the design, not a bug. The lesson: any "show me what's new since last time" feature has a bootstrap problem on the release that introduces it. Account for it in the OTA copy.

The Silent API Rename (V4)

What happened

V4 switched from RevenueCat to react-native-iap for direct store communication. The LLM wrote the integration using v14 of the library. What nobody caught: v14 silently renamed its core APIs. requestSubscription became requestPurchase. getSubscriptions became fetchProducts. The old names still imported without error — they just resolved to undefined at runtime.

What it cost

No build error. No lint error. Tests passed because they mocked the module. The validate-vibe script checked code patterns, not import resolution. Only a real device test caught it — the purchase button did nothing. The failure was invisible to every automated check in the pipeline.

What got built

A new validate-vibe check that reads the installed module's actual exports and verifies every import name resolves. If react-native-iap renames another function, the check catches it before the code ships. The lesson: when a library changes its API surface, your mocks hide the breakage.

The pattern across all seven: the LLM gave a confident output. I had no framework to evaluate it. The output became infrastructure. When it turned out to be wrong, the fix was expensive because decisions had been built on top of it.
The Process

The Process Failures

Not code bugs. Judgment bugs. Decisions that cost time because the framework for evaluating them didn't exist yet.

The Mom Group Trap

What happened

Weeks of design work on features driven by feedback from a mom group on social media. The feedback was enthusiastic. The problem: the people giving it weren't users. They were repeating what sounded interesting, filtered through their own context, with no connection to the actual use case.

The features they described (social check-ins, recipe integration, gamification) sounded reasonable in isolation. They were completely wrong for the product. The app picks a random restaurant. It doesn't need a social feed.

What it cost

Weeks of design iteration on the wrong mental model. Time that could have been spent on features actual users needed (like the exclude filter, or walk mode).

What fixed it

Built a feedback taxonomy with five types (behavioral, voiced pain, feature requests, ambient noise, market signal). The mom group feedback was genuine; the mistake was in how I weighted it. The signal-to-noise rubric now requires decomposing every request into the underlying need before evaluation.

The Promo Code Gap

What happened

Built a promo code system for early adopters and testers. Had codes ready to distribute. The problem: the highest-value testers were already gone. The hostel strangers who gave the exclude-filter feedback, the walker who surfaced the walk-mode gap, the colleague who couldn't demo to a friend. No contact information was captured during those interactions.

What it cost

Lost the ability to close the loop with the people who shaped the product most. Could not reward the testers who mattered. Could not get follow-up feedback from the users whose input was most valuable.

What fixed it

Created STAKEHOLDERS.md in the repo root: a lightweight register tracking who gave feedback, what channel, what type, what it became, and whether the loop was closed. Includes a capture checklist for future sessions. Not a formal tool; just a file that gets updated. The hostel testers are still lost, but the next round won't be.

The Platforms

Same Bug, Two Platforms

The same mistakes shipped to both app stores. What happened next depended on which platform caught them first.

The Clerk Key Crash

What happened

A missing Clerk publishableKey caused a fatal crash on launch. The EAS build injected environment variables differently than local builds, so the crash never appeared in testing. The local build worked. The store build didn't.

Android

Submitted, accepted, promoted to production. Real users downloaded versionCode 19. The app crashed on launch. Users saw a broken app. Rolled back to vC18, but the damage was done.

iOS

Submitted the same build. Apple's review cycle meant it hadn't been approved yet. Found and fixed the crash before it ever reached a user. The review delay that felt like friction was protection.

What it proved

Apple's gatekeeping caught what Android's speed let through. The $99/year doesn't just buy a developer account. It buys time between "submitted" and "live" where mistakes can still be caught.

The IAP Gauntlet

What happened

Setting up in-app purchases on both platforms. Same products, same prices, same subscription tiers. Completely different experiences.

Google

Products created via API. Immediately active. No review of product metadata. Subscriptions working within the hour.

Apple

Manual metadata entry for each product. Review screenshots required. Terms of Use link required. Two rejections before approval. Products stuck in MISSING_METADATA for days. But every requirement Apple enforced was something users actually need to see.

What it proved

Google's speed meant subscriptions shipped fast. Apple's friction meant they shipped correctly: with disclosure text, restore buttons, and legal links that protect both the user and the developer.

The Endpoint Rename (Platform Angle)

What happened

The refactor that renamed API endpoints hit both platforms differently because of where each was in its release cycle.

Android

Already live. Real users on the old app version hit 404 errors immediately when the backend changed. The fast release cycle meant users were on the broken path before anyone noticed.

iOS

Still in Apple's review queue. The app version calling old endpoints was never approved, so no iOS user experienced the break. Review delay, again, acted as a buffer.

Paying for Builds We Could Run for Free

What happened

Every EAS build ran on Expo's cloud infrastructure by default. The eas build command, without any flags, sends your code to remote servers that compile, sign, and return the binary. This costs real money per build. The alternative — eas build --local — produces the exact same .ipa and .aab files on the developer's own Mac, for free. The flag existed from day one. It was never surfaced by the AI assistant, even during explicit conversations about build costs.

What it cost

Dozens of unnecessary cloud builds over months of development. Each one could have been local. The cost wasn't catastrophic, but it was entirely avoidable — the kind of silent leak that compounds when you're not watching.

What got built

A permanent rule in the project guide: always use --local for all EAS builds. Cloud builds are never triggered unless there's a specific reason the local machine can't produce the artifact. The default assumption flipped: if the developer has a Mac, local builds are the default.

The lesson

"Free" tools with paid tiers optimize for their paid path. The default eas build command runs in the cloud. The free alternative is a flag you have to know to add. AI assistants trained on documentation default to the documented happy path — which is often the paid one. Always ask: "Can I do this locally?"

The patterns behind these incidents (subscription group level ordering, replacement params, sales-report lag, the OTA contract) live on the new Platform Engineering page. The stories above are the incidents; the contract is what made them inevitable.

The pattern: Android's speed is real, but it shifts all quality responsibility to the developer. Apple's friction is real, but it's friction that catches things a solo developer can't. Both platforms have tradeoffs. The indie developer needs to understand what each one costs and what each one protects.
The Pattern

When Each Failure Happened

Early failures were code. Middle failures were LLM trust. Late failures were process. The progression tells the real story: what you don't know shifts as you learn.

v1.0 Launch
Open Backend
No auth, no rate limiting. Weeks of exposure. Nobody found it.
v1.0 Launch
Google API Pricing Lie
$200/month credit didn't exist. Cost model built on false data.
v2.0 Development
iOS Rejection
Missing subscription compliance. Two-day rework.
v2.0 Development
Build Credit Burn
LLM said "unlimited." Then cited itself. Email said 80% used.
v2.0 Development
Mom Group Trap
Weeks of design on features from non-users.
v3.0 Development
Refactor Break
Renamed endpoints. Live apps still called old names. Errors.
v3.0 Development
Security Exploit
Endpoints accepted any user ID without verification.
v3.0 Development
ANR on Android
Clerk SSO hung on non-Chrome browsers. Platform limitation.
v3.0 Deployment
Clerk Key Crash
Missing env var crashed Android in production. Apple's review cycle blocked the same bug from reaching iOS users.
v3.0 Deployment
IAP Rejection Cycle
Google: products active in an hour. Apple: 2 rejections, days of metadata fixes. Apple's requirements were all things users need.
v3.0 Testing
Annapolis Demo
App broke at a restaurant. Three users, gone.
v3.0 Testing
Promo Code Gap
Codes built. Best testers already gone. No contact info captured.
Post v3.0
31-Review Suite Built
Every failure above now has a corresponding review that catches it.
v4.0 Development (March 2026)
Cloud Build Cost Leak Discovered
Realized eas build --local produces identical artifacts for free. All prior cloud builds were unnecessary spend.
v4.0 Development (March 2026)
Local Build Policy Enforced
CLAUDE.md updated: --local flag required on all EAS builds. Cloud builds prohibited unless justified.
v4.2 OTA (April 2026)
Dependabot Trap
RN minor bumped inside an SDK major broke OTA pipeline. expo run passed; expo export:embed didn't.
v4.2 OTA (April 2026)
OTA Orphan
Bumped binary version with policy=appVersion. OTA never reached a user.
v4.2 OTA (April 2026)
Two-Version Convention Documented
MAJOR.MINOR for OTAs (APP_VERSION); MAJOR.MINOR.PATCH for binary (app.json). Trap closed.
The Response

What Got Built in Response

For every category of failure, a system was built to prevent recurrence. The failures are orange. The responses are teal.

Code Failures

31-review suite with automated checks (ESLint, Prettier, Secretlint, Knip, npm audit) plus 26 manual deep-dive reviews. Runs before every deploy.

  • Review 16: IAP/Subscription Compliance
  • Review 21: API Endpoint Hygiene
  • Review 30: Migration and Upgrade Safety

Demo Failures

Demo prep protocol. Before any stakeholder touchpoint: verify the build is production, test on the actual device, confirm the backend matches, have a fallback.

  • Pre-push impact check (all platforms)
  • Never push untested builds
  • Verify on physical device via USB

LLM Failures

Verify-before-trust protocol. Never trust LLM assertions about pricing, platform limits, or compliance requirements. Always check the primary source.

  • Verify vendor pricing pages directly
  • Cancel superseded builds immediately
  • Security review on every endpoint
  • Review 7: Security (hardcoded secrets, eval, injection)

Process Failures

Feedback taxonomy and stakeholder map. Classify every piece of feedback before acting on it. Capture contact info during engagement.

  • Five feedback types with weighted signals
  • Signal-to-noise rubric for requests
  • Lightweight stakeholder register
  • Loop-closing checklist

Platform Failures

Test the store build, not the local build. EAS builds inject env vars differently than local. A passing local build proves nothing about the production build.

  • Never push untested builds to production track
  • Download and test from the store/TestFlight
  • Ship Android first (cheapest to iterate)
  • Use Apple's review cycle as a safety net, not an obstacle

The Common Thread

Most of these failures would have been caught in five minutes by someone with the right experience. The iOS rejection, the billing exposure, the endpoint rename, the pricing assumption. None of them were hard problems. They were gaps in knowledge that a mentor, a code reviewer, or a senior engineer would have flagged instantly.

A solo builder doesn't have that person. An LLM is not that person. The LLM will confidently fill the gap with wrong answers that feel right.

The fix was not "be smarter." The fix was building systems that catch what I can't see. The 31-review suite, the verify-before-trust rule, the demo prep protocol. These are the scar tissue. They exist because something broke.