Notes on agent memory.
Guides, references, and field notes on building memory for AI agents.
Memory agents, audit log, and rollback
Two things shipped this month: a memory-agents framework for scheduled background workers (Watchdog, Logger, Janitor, Consolidator, Bucket Profiler), and full reversibility on every memory mutation with a 90-day rollback window and point-in-time reads.
OpinionWhy we open-sourced the composer prompt
We published the v44 composer prompt from our 91.6% LongMemEval-S run under MIT. The reasoning — what we released, what we kept closed, and why the asymmetry is the point.
EngineeringWhy text_hash beats embedding-based dedup for agent memory
Dedup is the under-appreciated job of a memory product. Why we lead with a deterministic text_hash and a partial unique index, and use embedding similarity only as the secondary lane.
EngineeringIterating the extraction prompt: 28 versions and what each one fixed
Our triple-extraction prompt went through 28+ versions over the last year. What broke, what we tried, what stuck — plus the full current prompt, MIT-licensed.
Security"We never train on customer data" — what that actually requires
"We never train on customer data" is on every AI vendor's marketing page. The architecture, contracts, and audit posture that make it true are not — here's what to ask for.
BenchmarkBest-of-N on agent-memory queries: the regression check most people skip
Best-of-N looks like a free accuracy boost until you measure the win side. On 191 sampled wins from LongMemEval-S, the regression rate cancelled the gain at 3x the cost.
TutorialEngram + Vercel AI SDK: memory-aware chat in a Next.js app
Wire Engram's HTTP API into a Next.js + Vercel AI SDK chat app. Tool definitions, route handler, the system prompt that makes the agent actually call them, and production caveats.
EngineeringWhen pgvector slowed down past 500 buckets per tenant
Mid-benchmark our pgvector queries got 3x slower past ~500 buckets per tenant. The fix wasn't index tuning — it was a per-bucket fan-out loop spinning over 350 empty buckets.
OpinionPatterns from agent papers that didn't work for us
Critic-and-retry, better extractors, date pre-passes, and prompt iteration past v44. Four patterns from agent papers we tried on LongMemEval, measured against baseline, and either dropped or shelved.
PricingPricing for memory: why we rejected MAU, per-project meters, and a Scale tier
Every pricing model we considered for Engram, why each looked attractive, and the specific failure mode that made us drop it — including the $499 Scale tier that took three months to admit was friction, not revenue.
EngineeringZero-downtime backfill migrations: the HMAC rollout in detail
How we migrated every API key from bcrypt to HMAC with zero downtime and zero revocations — opportunistic backfill on the verify path, a partial unique index, and a two-phase deploy.
EngineeringCookie scoping for cross-subdomain auth: the gotcha that bites everyone
A cookie set on api.lumetra.io should be visible on portal.lumetra.io but not on mcp.lumetra.io, and dev on localhost behaves nothing like prod. The rules, the OAuth dance, and the ten-line helper we landed on.
TutorialAdd Engram memory to ChatGPT as a custom connector
ChatGPT accepts custom MCP connectors on supported plans. Ten-minute walkthrough to point it at Engram, complete OAuth, and the system prompt that makes the model actually use memory.
BenchmarkReproducing the 91.6%: a step-by-step from the LongMemEval-S run
A direct follow-up to our 91.6% on LongMemEval-S: the exact stack, v44 composer prompt, profile schema, judge config, and retrieval knobs you need to verify the number end-to-end.
EngineeringDesigning the memories table for a system you can't easily migrate
An annotated walkthrough of Engram's memories table — every column, why it exists, what we'd have added on day one, and what each migration actually cost in production.
EngineeringBuilding a 22-second deploy smoke that catches real bugs
A deploy smoke that always passes isn't a smoke — it's a status page. Ours runs 93 checks across REST, OAuth, MCP, and admin in 22 seconds, and caught 6 real bugs while we built it.
OpinionHosted inference vs BYOK: the unit economics of agent memory
A memory product fans out 3-5 LLM calls per ingest and 2-3 per query. With math, this is why hosted inference can't price stably for agent memory — and why Engram is BYOK.
TutorialAdd Engram memory to Windsurf in three minutes
One config edit, a restart, and the system prompt that makes Windsurf actually call query_memory before answering — durable memory across sessions in three minutes.
EngineeringThe 200ms auth floor: replacing bcrypt with HMAC for API keys
Every authenticated request was paying ~200ms for a cost-12 bcrypt verify on a 256-bit random key. We measured it, swapped to HMAC-SHA256 with a server-side pepper, and shipped a zero-downtime migration.
BenchmarkEngram on LongMemEval-S: 91.6%
458/500 on the public long-term-memory benchmark. Full methodology: server-side hybrid retrieval, a canonical user-profile pass (+4 points), our v44 composer prompt (published MIT). Plus the things that didn't work — critic-and-retry, better extraction, date pre-passes — and where the remaining 42 failures actually live.
GuideWhat is AI agent memory?
A practical, vendor-neutral guide to the category — what agent memory is, why stateless LLMs need it, which retrieval approaches exist, how to evaluate them, and how Engram fits in.