Framework Migration And True Motivation
Sources: 1 • Confidence: Medium • Updated: 2026-04-12 10:35
Key takeaways
- The move away from Next.js was not driven by a recent Next.js exploit or security scare.
- There was an active investigation, including a call with Pouya and others, to determine the root cause of the TanStack route-loading failures.
- In the TanStack-based architecture, the app integrates Convex hydration and retains a TRPC client for certain server-side data needs.
- TanStack Start’s routing approach was described as combining code generation with TypeScript inference to provide end-to-end type-safe route parameters and loaders that ensure required data is fetched before a route component renders.
- Cloudflare Workers bundle-size limits (about 3MB free and 10MB paid) were exceeded by the team’s server-side code, making Cloudflare impractical without significant deployment complexity.
Sections
Framework Migration And True Motivation
- The move away from Next.js was not driven by a recent Next.js exploit or security scare.
- T3Chat was migrated off Next.js to TanStack Start.
- At launch, T3Chat used Next.js while replacing the Next router with a hacked-in React Router setup using rewrites to a static app shell.
- A primary original reason for using Next.js was to deploy frontend and backend together from one codebase to keep versions synchronized.
- The team intentionally used Next.js in a client-first way and avoided Server Components to target SPA-like navigation speed after initial JS load.
- The rewrite strategy initially broke TRPC traffic and required a custom header to keep TRPC requests from being routed to the static app shell.
Production Failures Root Cause Uncertainty And Mitigation
- There was an active investigation, including a call with Pouya and others, to determine the root cause of the TanStack route-loading failures.
- The solution required patching TanStack Start server core to expose a Nitro/H3 event binding so Nitro events could be handled directly for mixed Nitro and TanStack routes.
- Deployments encountered a recurring root-route error where Node’s undici-based fetch failed around roughly 60% rollout, causing server issues and clients not rendering.
- A suspected cause of TanStack route-loading failures was that route lazy-bundling loaded route-specific JavaScript on-demand but retained many bundles in memory, and under high concurrency could fail to dedupe or clean up, leading to overload and EMFILE-like behavior.
- As a mitigation, API endpoints were removed from TanStack routing and Nitro route handling was used for API resolution while TanStack was kept for browser routes.
- Configuring Nitro with a serverDir in Vite was described as enabling server functions to be defined outside TanStack route definitions while still using TanStack Start.
Api Architecture Mixing Isolation And Runtime Tuning
- In the TanStack-based architecture, the app integrates Convex hydration and retains a TRPC client for certain server-side data needs.
- On Vercel, setting a chat API route maxDuration to 799 seconds in Next.js was described as intentionally forcing Fluid Compute to provision it separately from other 800-second routes to isolate long-running chat requests from short requests.
- As a mitigation, API endpoints were removed from TanStack routing and Nitro route handling was used for API resolution while TanStack was kept for browser routes.
- In the TanStack/Nitro setup, broad maxDuration configuration was described as bundling multiple APIs together and removing the earlier ability to split the chat endpoint from other API endpoints.
- In the TanStack/Nitro setup, APIs were described as being split from TanStack Start routes, and overall API call volume was described as reduced by moving most data access to Convex and the chat endpoint.
- After the migration, the Next.js app directory was described as effectively unused, with backend endpoints living under a separate backend API structure such as backend/api/chat with post/get handlers.
Tanstack Routing Codegen And Type Safety Mechanisms
- TanStack Start’s routing approach was described as combining code generation with TypeScript inference to provide end-to-end type-safe route parameters and loaders that ensure required data is fetched before a route component renders.
- TanStack Router was described as inferring route parameter names from route path strings and propagating them into loader and component types.
- The app uses a generated and committed route tree file with a .gen.ts suffix that should not be manually edited because it will be overwritten by code generation.
- Route loaders were described as being used to ensure required query data is loaded before the corresponding route component renders.
- Post-migration, the codebase was described as easier to understand and debug and routing was described as significantly improved.
Constraints Platform Ops And Bundle Limits
- Cloudflare Workers bundle-size limits (about 3MB free and 10MB paid) were exceeded by the team’s server-side code, making Cloudflare impractical without significant deployment complexity.
- The team explored exits from Next.js including Remix, React Router’s server approach, and a Vite+Hono rewrite targeting Cloudflare, but these paths were hindered by platform and documentation complexity.
- Vercel’s Fluid compute was described as changing the economics of long-running AI generation requests by making scaling cheaper than a model where each user chat effectively consumed a dedicated Lambda.
- Because the team was very small and lacked dedicated infrastructure staff, managed deployment via Vercel was described as necessary to preserve engineering velocity.
Watchlist
- There was an active investigation, including a call with Pouya and others, to determine the root cause of the TanStack route-loading failures.
Unknowns
- What objective before/after metrics changed following the migration (incident rate, latency/TTFB/LCP, deploy frequency, oncall load, and developer cycle time)?
- What is the confirmed root cause of the rollout-correlated undici fetch failure and the route-loading failures?
- Is the lazy-bundle resource exhaustion hypothesis correct (including any EMFILE-like behavior), and under what traffic patterns does it reproduce?
- Will the TanStack Start patch for Nitro/H3 event binding be upstreamed, and what is the upgrade/maintenance burden until then?
- Does the Nitro/Vercel deployment support per-route maxDuration (or equivalent) to re-establish chat endpoint isolation, and is isolation still needed given reduced API call volume?