Post-Migration Production Instability And Architectural Mitigation Via Nitro
Sources: 1 • Confidence: Medium • Updated: 2026-03-02 20:04
Key takeaways
- There is an active investigation involving Pouya and others to determine the true root cause of the TanStack route-loading failures.
- T3Chat was migrated off Next.js to TanStack Start.
- Cloudflare Workers bundle-size limits (about 3MB free and 10MB paid, per the speaker) were exceeded by the team's server-side code, making Cloudflare impractical without significant deployment complexity.
- TanStack Start's routing approach uses code generation plus TypeScript inference to provide end-to-end type-safe route parameters and loaders that gate rendering on required data being fetched.
- On Vercel, because the hard maximum request duration was 800 seconds, setting the chat API route maxDuration to 799 seconds in Next.js forced Fluid Compute to provision it separately from other 800-second routes, isolating long-running chat requests from short requests.
Sections
Post-Migration Production Instability And Architectural Mitigation Via Nitro
- There is an active investigation involving Pouya and others to determine the true root cause of the TanStack route-loading failures.
- The solution required patching TanStack Start server core to expose a Nitro/H3 event binding (getH3EventBinding) so the team could bind and handle Nitro events directly for mixed Nitro and TanStack routes.
- Deployments encountered a recurring root-route error where Node's internal undici-based fetch failed around ~60% rollout, causing server issues and clients not rendering.
- The team suspects TanStack route lazy-bundling retained many route bundles in memory and, under high concurrency, led to overload consistent with a 'too many files' (EMFILE) failure mode.
- As a mitigation, the team removed API endpoints from TanStack routing and instead relied on Nitro's route handling for API resolution while keeping TanStack for browser route experience.
- Configuring Nitro with a serverDir in Vite enables defining server functions in a Nitro-preferred structure outside TanStack route definitions while still operating within TanStack Start.
Framework-Router Misfit As Primary Migration Driver
- T3Chat was migrated off Next.js to TanStack Start.
- At launch, T3Chat used Next.js but replaced the Next router with a hacked-in React Router setup using rewrites to a static app shell.
- The team intentionally used Next.js in a client-first way, avoided Server Components, and targeted SPA-like navigation speed after the initial JS load.
- TRPC traffic initially broke due to the rewrite strategy and required using a custom header to keep TRPC requests from being routed to the static app shell.
- After migrating the data layer to Convex, moving auth to WorkOS, and rewriting backend logic with Effect for observability, the speaker attributes remaining major bugs largely to the Next.js-plus-React-Router integration.
- The move off Next.js was driven by a desire for a better SPA experience and a framework that keeps front end and back end deployed together under one package.json, not by a belief that Next.js or Vercel are inherently bad.
Platform And Org Constraints Shaping Viable Architectures
- Cloudflare Workers bundle-size limits (about 3MB free and 10MB paid, per the speaker) were exceeded by the team's server-side code, making Cloudflare impractical without significant deployment complexity.
- The team explored exits from Next.js including Remix, React Router's server approach, and a Vite+Hono rewrite targeting Cloudflare, but encountered blockers related to platform and documentation complexity.
- Vercel's Fluid compute changed the economics of long-running AI generation requests by making scaling cheaper than a prior model where each user chat effectively consumed a dedicated Lambda.
- Because the team was very small and lacked dedicated infrastructure staff, managed deployment via Vercel was a practical necessity to preserve engineering velocity.
Tanstack Start Routing Guarantees And Dx Improvements
- TanStack Start's routing approach uses code generation plus TypeScript inference to provide end-to-end type-safe route parameters and loaders that gate rendering on required data being fetched.
- TanStack Router provides type-safe route parameters by inferring parameter names from the route path string and propagating them into loader/component types.
- The app uses a generated and committed route tree file (*.gen.ts) that should not be manually edited because it will be overwritten by code generation.
- Post-migration, the codebase is described as easier to understand and debug, routing is significantly improved, and implementation ownership shifted away from the host's personal hacks to the team (notably Mark and Julius).
Request-Duration Isolation And Cost/Perf Tuning Regressions
- On Vercel, because the hard maximum request duration was 800 seconds, setting the chat API route maxDuration to 799 seconds in Next.js forced Fluid Compute to provision it separately from other 800-second routes, isolating long-running chat requests from short requests.
- Vercel's Fluid compute changed the economics of long-running AI generation requests by making scaling cheaper than a prior model where each user chat effectively consumed a dedicated Lambda.
- In the current TanStack/Nitro setup, configuring maxDuration at a broader level appears to bundle multiple APIs together and removes the prior ability to split the chat endpoint from other API endpoints.
Watchlist
- There is an active investigation involving Pouya and others to determine the true root cause of the TanStack route-loading failures.
Unknowns
- What objective reliability changes occurred after the migration (incident rate, error budgets, rollback frequency), especially for routing and API transport issues previously attributed to Next.js+React Router?
- What is the confirmed root cause of the rollout-correlated undici/fetch failures and route-loading instability, and what upstream or local fix resolves it without ongoing patch maintenance?
- Does the TanStack/Nitro-on-Vercel deployment support per-route (or per-handler) duration/concurrency configuration comparable to the prior Next.js maxDuration isolation tactic for chat generation?
- What are the measured performance outcomes of the new architecture (TTFB, LCP, click-to-render navigation latency), and how do SSR and loader-gated rendering contribute to or harm those metrics?
- What is the long-term plan for mixed data access (Convex hydration plus retained TRPC client), and what portion of server-side needs still require TRPC?