Skip to content

`/notifications` cold-start RCA (perf/260517-notifications-appcode-v2)

Reported, on a cold rm -rf .next && pnpm dev then a single load:

GET /notifications 200 in 29.5s (next.js: 14.9s, proxy.ts: 428ms, application-code: 14.2s)

next.js: 14.9s = Turbopack compiling the route on first hit (separate, deferred — see “Deferred”). This RCA is about the 14.2s “application-code”.

Evidence-first (no fix before root cause), cross-checked with a Codex gpt-5.5/xhigh sparring pass:

  • EXPLAIN (ANALYZE, BUFFERS) of the exact listNotificationsForUser query against the live cinatra.notifications.
  • A standalone cold-vs-warm timing harness (module-init, one-time ensurePostgresSchema 352-DDL, per-request query).
  • Full data-flow trace page.tsx → listNotifications → resolveCurrentUserId → getAuthSession / → listNotificationsForUser → host.ensurePostgresSchema.
MeasurementResult
cinatra.notifications total rows908 (heaviest user 869)
listNotificationsForUser query, 869-row user, EXPLAIN ANALYZE0.411 ms (seq-scan + top-N heapsort; tiny table)
Indexes on notificationspkey + user_unread (user_id,read_at,created_at DESC) + topic_created + dedupe_job_kind — query is already fast; an index would be useless
Heavy module-init (drizzle-store.ts import, proxy)~530 ms — one-time/process
One-time ensurePostgresSchema (352 DDL no-op round-trips)~279 ms — one-time/process (PID-scoped O_EXCL sentinel; acquired-ddl once, then global-hit/sentinel-hit)
Per-request query, warm0.26–0.39 ms

The DB query / data shape is categorically not the bottleneck.

The 14.2s is cold first-hit cost, not a persistent bug and not the query:

  1. First-time Turbopack compile + Node module-eval of a heavy server graph. The src/lib/notifications.ts compat shim transitively pulled @cinatra-ai/llm-orchestration (provider SDKs/MCP) solely for a BullMQ worker fallback the page never uses.
  2. src/lib/auth.ts does a top-level await getGoogleOAuthSettings() — DB work during module evaluation, before the page function starts.
  3. First-call ensurePostgresSchema() = 352 synchronous DDL no-ops (~0.3 s, one-time/process).
  4. Better-Auth cold init, and getAuthSession() runs 2–3× per request (requireAuthSession() + listNotifications()resolveCurrentUserId()).

Warm: sentinel early-return + cached modules + 0.4 ms query ⇒ sub-second.

This is the v2 implementation against the new @cinatra-ai/notifications package surface introduced by #436. PR #438 (against the pre-#436 base) was closed as superseded; the wins and instrumentation are re-ported here.

What this PR ships (safe, persistent, codex-validated)

Section titled “What this PR ships (safe, persistent, codex-validated)”
  • Fix A — route-local session dedupe. New session-free listNotificationsForUserId(userId) in src/lib/notifications.ts; /notifications/page.tsx reuses the session already resolved by requireAuthSession() instead of letting listNotifications() call getAuthSession() a second time. Removes one better-auth round-trip + enrichment pass per request, warm too. listNotifications() is unchanged for API/worker callers.
  • Fix B — lazy worker-fallback import. @cinatra-ai/llm-orchestration is now await import()-ed inside resolveWorkerUserId() only. It leaves the cold static module graph of every page/API route importing the notifications compat shim; workers (the only callers) pay a one-time dynamic-import instead. Genuine import failures stay fail-visible (console.warn), not folded into the silent ALS-not-active catch.
  • Gated RCA instrumentation (CINATRA_PERF_NOTIFICATIONS=1, zero behavior change otherwise) at page / auth-session / database / service boundaries. The helper lives inside the package (@cinatra-ai/notifications/perf-log) so the host (page, auth-session) and the package (service) share one helper without inverting dependency direction.
Terminal window
CINATRA_PERF_NOTIFICATIONS=1 pnpm dev # cold, then reload (warm)
# grep the dev log for [notif-perf]; compare cold vs warm; getAuthSession.call#
# should be 1 per /notifications request after Fix A.

Deferred (cold-path follow-ups, not this PR)

Section titled “Deferred (cold-path follow-ups, not this PR)”
  • Move ensurePostgresSchema() off the first request path (setup/startup owns schema readiness).
  • Remove/lazy the auth.ts top-level await getGoogleOAuthSettings().
  • Broader request-scoped getAuthSession() dedupe via React cache() (global, larger blast radius — evaluate separately).
  • The next.js: 14.9s Turbopack route-compile is the already-deferred cold-start lever A/B (scripts/bench-cold-start.mjs / quick 260517-g9j).