Engineering
Cookie scoping for cross-subdomain auth: the gotcha that bites everyone
You split a product across portal., admin., api., and mcp. subdomains, write a normal login endpoint, and then nothing works the way you expected. The auth cookie is set by one host and invisible to another. The dev environment behaves nothing like prod. OAuth bounces silently lose the session. This is the cookie-scoping write-up we wish we'd had a month before we needed it.
Every multi-host web product hits the same wall eventually. You start with one origin, ship features, then the inevitable happens: marketing wants its own static site, the dashboard needs its own deploy, the API gets its own subdomain for CORS and rate-limit reasons, and somewhere along the way a third-party integration shows up with strict opinions about where auth state has to live. Now your auth cookie has to be readable on three hosts and invisible on a fourth, dev has to mirror prod closely enough to be useful, and the OAuth redirect dance has to land cookies in the right jar at the right millisecond.
The mechanics of how the browser scopes cookies aren't subtle. They're written down in RFC 6265. But they have enough corners that almost everyone learns them the hard way, mid-incident, with a coworker on a call saying "I'm logged in on this tab but not that one." This post walks through the layout we ended up with, the rules that actually matter, the OAuth-redirect quirk that nearly cost us a day, and the ten-line Python helper we landed on. The lesson generalizes to anything that spans subdomains; pick your stack, the rules don't change.
The setup
Engram runs on four hosts in prod:
lumetra.io: the marketing site (Astro). No auth state.portal.lumetra.io: the customer dashboard (React + Vite, served as a static build behind nginx). Reads the auth cookie to render/auth/meand gate every page.api.lumetra.io/admin.lumetra.io: the REST API and OAuth provider (Flask). Sets the auth cookie on signin, reads it on subsequent calls.mcp.lumetra.io: the MCP server. Authenticates with Bearer tokens, not cookies, and absolutely should not see the portal/api auth cookie.
That last constraint matters. MCP clients (Claude Code, Cursor, ChatGPT Connectors, whatever's next) authenticate by presenting an OAuth-issued access token in an Authorization: Bearer header. If a stray auth_token cookie shows up on a request to mcp.lumetra.io, our middleware has to ignore it, and worse, we'd be leaking a long-lived session credential to a host whose request shape is wide open by design. Cookies should never reach that host in the first place.
So the goal is: a cookie set by api.lumetra.io needs to be visible on portal.lumetra.io and on itself, and invisible on mcp.lumetra.io. Three out of four hosts share state; one is excluded. The browser is going to enforce this entirely through the Domain, Path, Secure, and SameSite attributes you put on a single Set-Cookie response header. Get those wrong and you spend an afternoon wondering why the cookie shows up in DevTools but never gets sent.
The three rules that bit us
RFC 6265 is short and worth reading, but for this specific layout three behaviors carried all the weight. First, Domain widens scope, it never narrows it: a cookie set without a Domain attribute is host-only, and the moment you add Domain=.lumetra.io it becomes visible across every subdomain: portal, admin, api, mcp, anything. There is no syntax for "send to portal and api but not mcp." Second, Secure over plain HTTP is silently dropped. The Set-Cookie header looks accepted in the response, but no subsequent request carries the cookie, which is the single most common dev-vs-prod surprise: prod helper sets Secure=true, you reuse it on http://localhost, login returns 200, and every request after is unauthenticated. Third, SameSite=Strict will not survive an OAuth bounce because the return leg is initiated by the provider's origin, so you want Lax on anything that has to come back through a third party.
The dev/prod gap
Prod is the boring half: portal and API both HTTPS, both under lumetra.io, set Domain=.lumetra.io with Secure and SameSite=Lax, the browser accepts everything and you move on. Dev is where the surprises live.
In dev the portal runs on http://localhost:3001 and the API on http://localhost:5002. Two different ports, plain HTTP. The browser treats those as different origins for CORS but the same host (localhost) for cookies, so a host-only cookie set on localhost is visible across every port. That part is a small mercy.
The trap is the reflex to set Domain=.localhost for symmetry with prod. It does not work. Browsers refuse to honor a Domain attribute on a single-label public suffix; some reject the Set-Cookie outright, others quietly accept it and then never send the cookie back. Either way you get a cookie that doesn't function. The fix is to drop the Domain attribute entirely in dev and let it be host-only on localhost. And because Secure=true over HTTP also gets silently dropped, that one has to go too. SameSite=Lax works fine over HTTP, so it stays.
So dev doesn't just need different values for the cookie attributes; it needs different presence. Domain and Secure are absent in dev and present in prod. Code that just toggles Secure off an env var is doing half the job; you also have to toggle whether Domain exists at all. The cleanest version is a single helper that handles both flips, so no callsite ever has to remember which attributes apply where.
The helper
What we landed on is a single function, _cookie_kwargs(), that takes optional extras (like max_age) and returns the right keyword arguments for Flask's response.set_cookie. Every place in our codebase that sets a cookie calls this helper. There is no other place in our code that touches secure, domain, or samesite attributes. Centralizing it was the single highest-leverage change we made.
The dev/prod branch is gated on a derived flag, not on FLASK_ENV or DEBUG or anything else that's prone to drift. We compute _COOKIE_IS_LOCAL_DEV from a COOKIE_DOMAIN env var: if it's empty, or literally localhost, or none/off, we treat it as dev mode. Otherwise we use whatever's set. The default value is .lumetra.io, so prod just works without any env-var babysitting, and dev opts out with COOKIE_DOMAIN= in .env.local.
There is a sibling helper, _cookie_clear_kwargs(), that returns the right shape for response.delete_cookie. Flask (and the underlying werkzeug) requires that you pass the same path and domain on deletion as you did on creation, or the deletion silently no-ops. We made that mistake exactly once. The dev/prod split applies the same way: dev deletes are host-only, prod deletes need domain=.lumetra.io.
The OAuth redirect that breaks differently every time
The cookie rules above are necessary but not sufficient. The other half of the puzzle is what happens during an OAuth handshake initiated by a third party. In our case, Claude.ai web acting as an MCP client trying to authorize against api.lumetra.io/oauth/authorize.
Originally our /oauth/authorize endpoint, on receiving an unauthenticated request, redirected the user to /auth/google/login?next=.... That forced every customer through Google OAuth, even ones who'd signed up with a password and didn't have Google linked. Wrong default.
The corrected flow looks like this. Claude.ai web bounces the user's browser to https://api.lumetra.io/oauth/authorize?client_id=...&redirect_uri=...&state=.... Our endpoint checks for the auth_token cookie. If it's missing, we 302 to https://portal.lumetra.io/login?next=<absolute return URL>, where the return URL is a signed, server-side-stamped version of the original /oauth/authorize request that captures all the query parameters. The portal's login page reads ?next=, lets the user sign in via password or OAuth provider, and on success navigates the browser to the next URL, which lands back on api.lumetra.io/oauth/authorize?_engram_authstate=<signed state>, this time with the cookie.
For this to work, two things have to be true at the moment the browser arrives back at /oauth/authorize:
- The browser has the auth cookie. The portal's
POST /api/auth/signinset it. In prod the cookie was set withDomain=.lumetra.ioso it's visible atapi.lumetra.io. Check. - The browser is willing to send the cookie on this navigation. The navigation is a top-level
window.location.href = nextUrlfollowed by a 302 toapi.lumetra.io.SameSite=Laxpermits cookies on top-level navigations. Check.
If you'd used SameSite=Strict here, step 2 fails. The cookie was set in a context where the document was on portal.lumetra.io; the browser doesn't consider this "the same site" for Strict purposes in the way you'd hope, and even when it does, top-level navigation from a script-initiated location change is sometimes treated as cross-site. Lax sidesteps the whole question. We use Lax universally.
The other failure mode here is forgetting the Domain attribute in prod. The cookie ends up host-only on portal.lumetra.io, the request to api.lumetra.io carries nothing, and the flow loops: /oauth/authorize sees no cookie, redirects to /login, login was already done so the portal redirects to next, /oauth/authorize still sees no cookie, lather rinse repeat. That exact loop was the first symptom we hit before we tracked the Domain attribute down.
The dev-mode proxy gotcha
In dev, Vite serves the portal at localhost:3001 and proxies /api/*, /auth/*, /oauth/* (and a couple of others) to the Flask server at localhost:5002 with changeOrigin: true. From the browser's perspective, every request looks like it's hitting localhost:3001. The response, including any Set-Cookie headers, appears to come from localhost:3001. The cookie gets scoped to localhost:3001.
For everyday dev use this is fine. You hit POST /api/auth/signin via the proxy, the cookie lands on localhost:3001 (which is your host for everything via the proxy), and every subsequent /api/* call carries it. You can develop the entire portal experience and most of the API behavior end-to-end without thinking about cookies.
The moment you try to test the full OAuth handshake from Claude.ai web against your local stack, it falls apart. Claude.ai web doesn't know your portal is proxying things; it talks directly to whatever URL you configured your MCP server at. If that's a direct http://localhost:5002/oauth/authorize URL, which it has to be, because the OAuth provider is the API host, not the portal. Then the browser is sitting on localhost:5002, the cookie is scoped to localhost:3001, and the request goes out without auth.
You can't fix this with cookie attributes. The cookie genuinely was set on a different host (port-as-different-host is wrong; we said cookies don't care about port, but they do care about whether the response that set the cookie came back from localhost:3001 or localhost:5002, because changeOrigin: true rewrites the Host header but the browser still sees the response as coming from the URL it actually requested). The only real fixes are:
- Tunnel both hosts to the same parent domain (
ngrok,cloudflared, or any equivalent) so portal and API both live under*.your-tunnel.example.com, and you can setDomain=.your-tunnel.example.comfor the duration of the test. - Accept that the OAuth handshake from Claude.ai web specifically can't be tested against bare localhost. Test the rest of the API directly, and use a staging deploy for OAuth-flow end-to-end tests.
We do both. There's a make target that spins up a cloudflared tunnel with a fixed hostname for our team, and we mostly use staging for OAuth-flow rehearsals. The "dev experience for OAuth specifically is annoying" tax is real and we haven't solved it. It's the one part of this stack where we'd genuinely like to do better.
Why mcp.lumetra.io is the exception
We mentioned earlier that mcp.lumetra.io sees the auth cookie in prod (because Domain=.lumetra.io is a sub-tree-wide setting), and that this is intentionally fine.
The MCP server treats incoming requests as authenticated only if they carry an Authorization: Bearer <access_token> header. Cookies on those requests are read by Flask, sure, but our middleware never consults request.cookies for authentication decisions on the MCP server's route map. The cookie is a no-op there.
You could imagine wanting to be belt-and-suspenders strict: never let the cookie reach mcp.lumetra.io at all. That would require scoping the cookie to a different parent. Domain=portal.lumetra.io won't work (Domain must be a suffix of the setting host; api.lumetra.io can't set a cookie for portal.lumetra.io), but you could split the auth surface across a separate auth.lumetra.io host and only set Domain=auth.lumetra.io. We chose not to. The marginal security benefit was small relative to the architectural complexity of adding another host, and our access-token surface on mcp.lumetra.io is the actual control. Cookies that no code reads aren't a risk; they're just bytes.
The general principle: pick the smallest Domain attribute that satisfies your real cross-host needs, then enforce single-purpose auth at the application layer for any host that happens to fall inside that domain. Don't lean on cookie scoping as your only line of defense; use it as the routing rule, and have application-layer checks for whether a credential is appropriate for the requested route.
What we'd keep
Two decisions paid back the most. One: centralize cookie attribute construction in a single helper. With three engineers each setting cookies in two places, you end up with six configurations within a quarter and one of them is wrong. Twenty minutes of work, pays back in not-spent-debugging time approximately forever. Two: gate the dev/prod branch on a single env var with an obvious default. COOKIE_DOMAIN=.lumetra.io is right for every prod-shaped environment, COOKIE_DOMAIN= empty is right for every localhost-shaped one. Don't infer it from NODE_ENV or FLASK_ENV; those drift between staging configs, and you do not want cookie behavior to silently change because a deploy script flipped DEBUG=true. The one we'd skip on a redo is the half-day we spent trying to make the cross-localhost-port OAuth flow work without a tunnel. The browser is enforcing rules that exist for good reasons, the workarounds are worse than just running a tunnel, and the OAuth handshake genuinely needs a real domain. Plan accordingly.
The one open problem is dev ergonomics for that OAuth handshake. We have a make target and a doc, and that is the state of the art. If there is a clever way to make Claude.ai-web work against bare localhost without a tunnel, we have not found it.
The helper, in full
Ten lines of Python, give or take. This is the actual function we ship, lightly trimmed for inline display:
COOKIE_DOMAIN = os.getenv("COOKIE_DOMAIN", ".lumetra.io")
_COOKIE_IS_LOCAL_DEV = (
(not COOKIE_DOMAIN) or COOKIE_DOMAIN.lower() in ("localhost", "none", "off")
)
def _cookie_kwargs(extra: dict | None = None) -> dict:
"""Base kwargs for set_cookie / delete_cookie.
Prod: HttpOnly + Secure + SameSite=Lax + Domain=.lumetra.io.
Dev: HttpOnly + SameSite=Lax (no Secure, no Domain; host-only on localhost).
"""
kw: dict = {"httponly": True, "samesite": "Lax", "path": "/"}
if not _COOKIE_IS_LOCAL_DEV:
kw["secure"] = True
kw["domain"] = COOKIE_DOMAIN
if extra:
kw.update(extra)
return kw
Every response.set_cookie(...) in our codebase passes **_cookie_kwargs(). Every response.delete_cookie(...) passes **_cookie_clear_kwargs() (same idea, but only path + domain need to match for deletion). No route handler touches the attributes directly. The dev/prod difference exists in exactly one place. When a new engineer asks "why isn't my cookie being sent on this request," the answer is always one of three things: wrong path, missed the helper, or the dev/prod env var is set wrong. That's the whole search space.
The lesson generalizes well beyond Engram. If your product spans subdomains and you have auth, you will hit some version of this. Pick the parent domain you want to share auth across, set Domain exactly to that, use SameSite=Lax unless you have a very specific reason not to, gate Secure on HTTPS-ness, and put the whole thing behind a helper so the rest of your code never has to think about it.
Further reading
Closely related
- The 200ms auth floor: replacing bcrypt with HMAC for API keys. 50,000× speedup on hash verification by matching the primitive to the entropy of the secret it protects.
- Zero-downtime backfill migrations: the HMAC rollout in detail. Opportunistic backfill driven by the verify path, plus the partial unique index that made the rollout possible.
- Building a 22-second deploy smoke that catches real bugs. ~100 checks across 13 groups in 22 seconds. Caught six real bugs during construction. Design notes.
Engram
- Engram on LongMemEval-S: 91.6%. Full benchmark methodology and what didn't work.
- Engram docs. HTTP API, MCP setup for each client, SDK examples.
- Start with Engram. Free tier, BYOK, MCP-native.