Engineering

The 200ms auth floor: replacing bcrypt with HMAC for API keys

Every authenticated request to Engram had a ~200ms floor. Not the work. The auth. Cost-factor-12 bcrypt verify on every incoming key. We replaced it with HMAC-SHA256 keyed by a server-side pepper, shipped the migration with bcrypt as a fallback, and dropped per-request auth from ~200ms to ~4µs. Roughly a 50,000× speedup on the hash itself, and 40–100× on full request latency.

Published January 27, 2026 · By Jacob Davis and Ben Meyerson

This is a small post about a small change. The change itself is mostly a one-liner with a migration around it. The reason it’s worth writing up is that the pattern we replaced (bcrypt on API keys) is the default in a lot of codebases, and it’s wrong for reasons that aren’t obvious until you measure it. We ran on the broken version for months without noticing. A quick smoke test surfaced it in about ten minutes.

The smoke test that surfaced it

We have a smoke-test script that exercises the public REST endpoints against a local server: create a bucket, store a memory, query, list, delete. Trivial calls. Nothing that should be expensive. We ran it after a routine deploy and the timing column looked like this:

POST /v1/buckets: 213ms
POST /v1/buckets/{id}/memories: 224ms
POST /v1/query: 207ms
GET /v1/buckets: 209ms
GET /health (authenticated variant): 205ms

The numbers are too tight. Every endpoint, regardless of the work it does, lands in the 205–225ms band. GET /v1/buckets is one indexed SELECT against a small table. There’s no universe where the database work for that is anywhere near 200ms. When unrelated endpoints all share the same floor, the floor is upstream of any of them. Authentication runs on every request. That was the suspect.

We added a timer around verify_api_key and re-ran. Five back-to-back calls:

verify_api_key #1: 201.4ms
verify_api_key #2: 197.8ms
verify_api_key #3: 204.9ms
verify_api_key #4: 199.3ms
verify_api_key #5: 202.1ms

That’s the floor. Roughly 200ms ± 4ms, every call. The variance is too low for I/O or contention. It looks like a CPU-bound function with deterministic cost. Which is exactly what bcrypt at cost factor 12 is.

Why bcrypt was there in the first place

The original code did the textbook thing. When we created an API key we’d generate a random token, bcrypt-hash it, store the hash. When a request came in we’d pull candidate rows by the visible key prefix and call bcrypt.checkpw against each one. If any matched, the request was authenticated. Same pattern we use for user passwords.

Bcrypt at cost factor 12 takes roughly 200–300ms on modern x86. That’s the whole point of bcrypt. The cost is a feature. Whoever picked cost 12 picked it specifically because they wanted hash verification to be expensive. The bcrypt designers are not confused. If you have a low-entropy input that an attacker is going to guess, a 200ms verify is a 200ms-per-attempt brute-force ceiling, and that ceiling is the security property.

Which is correct for passwords. Users pick "summer2024!" or some shape of that. The brute-force search space is tractable; bcrypt slows it down to the point where it isn’t. If a dump of the password table leaks, you want the attacker to spend hardware-decades enumerating, not hardware-minutes.

The mistake is applying that same reasoning to API keys.

Why bcrypt is the wrong primitive for API keys

We were paying ~200ms per request, every request, to defend against an attack that the entropy of the key already prevented. The threat model and the primitive were mismatched. Bcrypt is a defense against low-entropy guessing. The thing we were protecting wasn’t low-entropy.

Engram API keys are generated server-side with secrets.token_urlsafe(32). That’s 32 bytes of cryptographically-random input, URL-encoded. 256 bits of entropy in the secret part of the key, plus a non-secret prefix for visibility in logs. The entire keyspace is 2^256.

Brute force is not a threat against 256-bit secrets. There’s no hash cost where it becomes one, and no hash cost where it doesn’t. The number is large enough that the question doesn’t exist. An attacker against an Engram API key is not going to guess it. They’re going to steal it from a logging system, a leaked .env file, a compromised dev machine, an intercepted webhook header. The way to defend against those is not "make the hash slow." It’s "don’t leak the key" and "rotate quickly when you do." Hash speed is irrelevant to any of it.

What hash speed does affect: every single authenticated request. The user, who is not the attacker, pays the bcrypt cost on every call. The defense was free of value and full of cost.

What we actually needed from the primitive

Before picking a replacement, we wrote down the properties we wanted from API-key verification. They look different from password verification:

Fast. Microseconds, not milliseconds. Auth runs on every request and shouldn’t be on the critical path’s budget.
Deterministic. The same input should produce the same hash every time. That lets us index the hash column and resolve a key with one SELECT, instead of pulling N candidate rows by prefix and trying each.
One-way. A leaked database dump shouldn’t reveal raw keys. The stored value should not, by itself, be a valid credential.
Pepper-keyed. A dump shouldn’t even reveal the hash function output in a useful way. If an attacker reads the table, they should still need a server-side secret to compute matching hashes. Otherwise they can grind a wordlist offline against the column if the keys ever became low-entropy for some reason (e.g., a future generator change).
Standard. Boring primitive, in the standard library, no crypto-roulette.

HMAC-SHA256 hits all five. Computed in microseconds. Same input always produces the same 256-bit output. Not reversible. Keyed by a server-side secret (the "pepper"). Sitting in hmac in the Python standard library since forever. The one design choice is where the pepper comes from.

Resolving the pepper

The pepper is just a server-side secret used to key the HMAC. It does not need to be rotated on every key; it’s a per-server value, not a per-key value. We wanted the lookup to be ergonomic across environments, including the developer machine that doesn’t have a separate API-key pepper configured.

The resolution order in shared_utils.py is:

API_KEY_PEPPER environment variable, if set. This is the preferred production configuration.
Fall back to JWT_SECRET. Every deployment already has this set, because JWT signing requires it. Reusing it means the HMAC fix works without ops setup. Both values protect server-issued credentials, so the threat model overlaps cleanly.

Reusing JWT_SECRET is a deliberate compromise. The strictly-cleanest design has a dedicated pepper. We can migrate to one later by rotating API_KEY_PEPPER independently; nothing about today’s code prevents that. The benefit of the fallback is that turning on HMAC verification on day one didn’t require any deployment to add a new secret. The two-line change in our Python code shipped and started working everywhere, immediately.

If neither env var is set, the function falls back to a constant string so single-process dev doesn't crash. Production deploys always set JWT_SECRET (we require it elsewhere), so the dev-fallback branch is unreachable in any real deployment. We considered raising instead, but the dev ergonomics of "you ran the server with no env and it crashed" were worse than the theoretical risk of a constant pepper that prod will never hit.

The migration: HMAC fast path, bcrypt slow path, opportunistic backfill

The migration had to keep existing keys working. We had real customers with real keys in the bcrypt-only column, and we didn’t want to invalidate them. Anything that requires customers to rotate their keys on our schedule isn’t a fix. It’s a problem we’re asking them to solve for us. The plan:

Add a new column, api_keys.key_hmac, nullable.
Add a partial unique index, WHERE key_hmac IS NOT NULL. Existing rows have NULL key_hmac and stay out of the index until backfilled.
create_api_key() writes both key_hash (bcrypt) and key_hmac (HMAC) for any new key.
verify_api_key() tries HMAC first (one indexed SELECT). If that misses, falls back to the legacy bcrypt path: scan by prefix, checkpw each row whose key_hmac is NULL.
On a successful bcrypt verify, opportunistically write key_hmac for that row so the next call hits the fast path.

The partial index is the load-bearing piece. A regular unique index on key_hmac would forbid multiple NULLs, which is exactly the state every pre-migration row is in. The partial index lets us treat "not yet migrated" as the default and "migrated" as the indexed special case. New rows show up in the index immediately because they’re written with both columns populated.

The opportunistic backfill on the slow path means we never need a separate backfill job. Every legacy key that gets used will migrate itself the first time it’s verified. Keys that are never used stay NULL forever. That’s fine: they cost nothing, and if they’re ever used they’ll migrate then. After a few weeks of normal traffic, effectively every active key is migrated and the bcrypt path becomes dead code we can remove in a future release.

Zero downtime, zero customer action. The deploy is a code change plus a schema change. Customers don’t know it happened.

The numbers after

Right after the deploy, we ran the same smoke-test script. The first authenticated call hit the bcrypt slow path (the key existed in the database with NULL key_hmac) and verified in 205ms, then backfilled. Every subsequent call to the same key hit the HMAC fast path: 0.08–0.12ms, including the database round-trip.

The smoke test’s REST timings, before and after:

Endpoint	Before	After	Speedup
GET /health (auth)	205ms	4ms	~51×
GET /v1/buckets	209ms	5ms	~42×
POST /v1/buckets	213ms	6ms	~36×
POST /v1/buckets/{id}/memories	224ms	16ms	~14×
POST /v1/query	207ms	22ms	~9×

The endpoints that do real work (POST /v1/buckets/{id}/memories writes an embedding, extracts triples, indexes for BM25; POST /v1/query runs hybrid retrieval) get a smaller relative speedup because their non-auth cost is meaningful. The endpoints that mostly authenticated-then-returned-quickly speed up almost by the full bcrypt delta.

On the hash function in isolation, the speedup is roughly 50,000×. Bcrypt verify was around 200ms; HMAC-SHA256 over a 43-byte input on the same machine is in the 2–4µs range. The 200ms-vs-4µs ratio is what was sitting under every single authenticated request.

All 12 smoke tests still pass. The two paths produce the same verification result for any given key, by construction. They’re different ways of asking the same question. The only observable change in behavior is that the answer comes back two orders of magnitude faster.

What this does and doesn’t change about security

A few properties worth being explicit about, since "we made auth faster" is the kind of headline that worries people.

The keys themselves didn’t get weaker. They’re still 256 bits of secrets.token_urlsafe(32). Brute-force was infeasible before and is infeasible now. Nothing about the hash function affects the entropy of the secret it’s hashing.

A leaked DB dump is no worse than before, and arguably slightly better. Before, an attacker who got the database had bcrypt hashes, useless for replay (you can’t bcrypt-verify against a service that expects raw keys) but a long-term grind target if some future key generator ever produced low-entropy values. Now, an attacker who gets only the database has HMAC outputs they can’t replay either, and can’t even compute matching HMACs offline against a wordlist without also stealing the pepper from the server environment. The pepper raises the bar from "DB compromise" to "DB compromise plus env compromise."

The bcrypt fallback is still there. Until we run a one-time backfill or simply let unused keys age out, some rows still have bcrypt hashes. Those keys verify slowly on first use and then upgrade. Anything we’d say about bcrypt’s properties still holds for those rows.

Timing leakage. HMAC comparison in Python’s hmac.compare_digest is constant-time on the bytes it compares, which is what we use for the equality check at the application level. The database lookup is an indexed equality on a fixed-size hex string. No string-prefix shortcut, no leakage of which bytes matched. Bcrypt was also constant-time. Net: no new timing oracle.

Pepper rotation. Rotating the pepper invalidates the HMAC lookup for every existing key. In practice this means a pepper rotation requires either a planned re-issue of keys or a dual-pepper grace period in verify_api_key. We haven’t built the dual-pepper window yet because we don’t need it yet; the path is sketched in a comment in shared_utils.py for whoever does need it.

The generalizable lesson

The right cryptographic primitive depends on the entropy of the thing you’re protecting.

User passwords are low-entropy. The attacker’s strategy is guessing. The defense is to make each guess expensive. Bcrypt, argon2, scrypt. All good. The slowness is the security property.

Server-issued tokens are high-entropy. The attacker’s strategy is theft, not guessing. The defense is to limit blast radius (rotation, scopes, revocation, audit logs) and to make stored copies non-replayable (HMAC under a server pepper, or encryption-at-rest of the raw value). Hash speed is not part of the defense. Slow hashes just tax every legitimate request.

The trap is that "hash before storing secrets" is a true, useful, often-repeated principle, and bcrypt is the canonical primitive that gets reached for. Applying it everywhere looks responsible from one foot away. From two feet away you’re paying 200ms per request for nothing. The principle is right. The primitive needs to match the entropy of the input.

Checklist we’re now using on internal code review:

Is this secret user-chosen or server-generated? If server-generated, what’s the entropy?
What’s the threat model: guessing, theft, or both?
Does the storage primitive need to defend against guessing? If not, it shouldn’t be slow.
Does it need to be deterministic so we can index it? Then we want a keyed hash, not a salted one.
Is there a server-side secret we can fold in to protect against DB-only compromise?

None of this is novel if you work on auth professionally. The reason we’re publishing anyway: "200ms bcrypt on API keys" is a default that ships in a lot of frameworks and tutorials. A fair number of services running today have the same floor under every authenticated call and haven’t measured for it. If you run one of them, go time verify_api_key. If it’s flat at ~200ms, that’s your half-day of work.