`/notifications` cold-start RCA (perf/260517-notifications-appcode-v2)
Symptom
Section titled “Symptom”Reported, on a cold rm -rf .next && pnpm dev then a single load:
GET /notifications 200 in 29.5s (next.js: 14.9s, proxy.ts: 428ms, application-code: 14.2s)next.js: 14.9s = Turbopack compiling the route on first hit (separate, deferred
— see “Deferred”). This RCA is about the 14.2s “application-code”.
Method
Section titled “Method”Evidence-first (no fix before root cause), cross-checked with a Codex gpt-5.5/xhigh sparring pass:
EXPLAIN (ANALYZE, BUFFERS)of the exactlistNotificationsForUserquery against the livecinatra.notifications.- A standalone cold-vs-warm timing harness (module-init, one-time
ensurePostgresSchema352-DDL, per-request query). - Full data-flow trace
page.tsx → listNotifications → resolveCurrentUserId → getAuthSession/→ listNotificationsForUser → host.ensurePostgresSchema.
Evidence
Section titled “Evidence”| Measurement | Result |
|---|---|
cinatra.notifications total rows | 908 (heaviest user 869) |
listNotificationsForUser query, 869-row user, EXPLAIN ANALYZE | 0.411 ms (seq-scan + top-N heapsort; tiny table) |
Indexes on notifications | pkey + user_unread (user_id,read_at,created_at DESC) + topic_created + dedupe_job_kind — query is already fast; an index would be useless |
| Heavy module-init (drizzle-store.ts import, proxy) | ~530 ms — one-time/process |
One-time ensurePostgresSchema (352 DDL no-op round-trips) | ~279 ms — one-time/process (PID-scoped O_EXCL sentinel; acquired-ddl once, then global-hit/sentinel-hit) |
| Per-request query, warm | 0.26–0.39 ms |
The DB query / data shape is categorically not the bottleneck.
Root cause
Section titled “Root cause”The 14.2s is cold first-hit cost, not a persistent bug and not the query:
- First-time Turbopack compile + Node module-eval of a heavy server graph.
The
src/lib/notifications.tscompat shim transitively pulled@cinatra-ai/llm-orchestration(provider SDKs/MCP) solely for a BullMQ worker fallback the page never uses. src/lib/auth.tsdoes a top-levelawait getGoogleOAuthSettings()— DB work during module evaluation, before the page function starts.- First-call
ensurePostgresSchema()= 352 synchronous DDL no-ops (~0.3 s, one-time/process). - Better-Auth cold init, and
getAuthSession()runs 2–3× per request (requireAuthSession()+listNotifications()→resolveCurrentUserId()).
Warm: sentinel early-return + cached modules + 0.4 ms query ⇒ sub-second.
This is the v2 implementation against the new
@cinatra-ai/notificationspackage surface introduced by #436. PR #438 (against the pre-#436 base) was closed as superseded; the wins and instrumentation are re-ported here.
What this PR ships (safe, persistent, codex-validated)
Section titled “What this PR ships (safe, persistent, codex-validated)”- Fix A — route-local session dedupe. New session-free
listNotificationsForUserId(userId)insrc/lib/notifications.ts;/notifications/page.tsxreuses the session already resolved byrequireAuthSession()instead of lettinglistNotifications()callgetAuthSession()a second time. Removes one better-auth round-trip + enrichment pass per request, warm too.listNotifications()is unchanged for API/worker callers. - Fix B — lazy worker-fallback import.
@cinatra-ai/llm-orchestrationis nowawait import()-ed insideresolveWorkerUserId()only. It leaves the cold static module graph of every page/API route importing the notifications compat shim; workers (the only callers) pay a one-time dynamic-import instead. Genuine import failures stay fail-visible (console.warn), not folded into the silent ALS-not-active catch. - Gated RCA instrumentation (
CINATRA_PERF_NOTIFICATIONS=1, zero behavior change otherwise) at page / auth-session / database / service boundaries. The helper lives inside the package (@cinatra-ai/notifications/perf-log) so the host (page, auth-session) and the package (service) share one helper without inverting dependency direction.
Reproduce
Section titled “Reproduce”CINATRA_PERF_NOTIFICATIONS=1 pnpm dev # cold, then reload (warm)# grep the dev log for [notif-perf]; compare cold vs warm; getAuthSession.call## should be 1 per /notifications request after Fix A.Deferred (cold-path follow-ups, not this PR)
Section titled “Deferred (cold-path follow-ups, not this PR)”- Move
ensurePostgresSchema()off the first request path (setup/startup owns schema readiness). - Remove/lazy the
auth.tstop-levelawait getGoogleOAuthSettings(). - Broader request-scoped
getAuthSession()dedupe via Reactcache()(global, larger blast radius — evaluate separately). - The
next.js: 14.9sTurbopack route-compile is the already-deferred cold-start lever A/B (scripts/bench-cold-start.mjs/ quick 260517-g9j).