Notes on agent memory.

Guides, references, and field notes on building memory for AI agents.

Pinned Engineering

Memory agents, audit log, and rollback

Two things shipped this month: a memory-agents framework for scheduled background workers (Watchdog, Logger, Janitor, Consolidator, Bucket Profiler), and full reversibility on every memory mutation with a 90-day rollback window and point-in-time reads.

May 20, 2026 · 14 min read

Opinion

Why we open-sourced the composer prompt

We published the v44 composer prompt from our 91.6% LongMemEval-S run under MIT. The reasoning — what we released, what we kept closed, and why the asymmetry is the point.

April 13, 2026 · 12 min read

Engineering

Why text_hash beats embedding-based dedup for agent memory

Dedup is the under-appreciated job of a memory product. Why we lead with a deterministic text_hash and a partial unique index, and use embedding similarity only as the secondary lane.

January 5, 2026 · 15 min read

Engineering

Iterating the extraction prompt: 28 versions and what each one fixed

Our triple-extraction prompt went through 28+ versions over the last year. What broke, what we tried, what stuck — plus the full current prompt, MIT-licensed.

April 16, 2026 · 17 min read

Security

"We never train on customer data" — what that actually requires

"We never train on customer data" is on every AI vendor's marketing page. The architecture, contracts, and audit posture that make it true are not — here's what to ask for.

March 2, 2026 · 18 min read

Benchmark

Best-of-N on agent-memory queries: the regression check most people skip

Best-of-N looks like a free accuracy boost until you measure the win side. On 191 sampled wins from LongMemEval-S, the regression rate cancelled the gain at 3x the cost.

April 20, 2026 · 13 min read

Tutorial

Engram + Vercel AI SDK: memory-aware chat in a Next.js app

Wire Engram's HTTP API into a Next.js + Vercel AI SDK chat app. Tool definitions, route handler, the system prompt that makes the agent actually call them, and production caveats.

May 11, 2026 · 16 min read

Engineering

When pgvector slowed down past 500 buckets per tenant

Mid-benchmark our pgvector queries got 3x slower past ~500 buckets per tenant. The fix wasn't index tuning — it was a per-bucket fan-out loop spinning over 350 empty buckets.

January 12, 2026 · 11 min read

Opinion

Patterns from agent papers that didn't work for us

Critic-and-retry, better extractors, date pre-passes, and prompt iteration past v44. Four patterns from agent papers we tried on LongMemEval, measured against baseline, and either dropped or shelved.

April 23, 2026 · 16 min read

Pricing

Pricing for memory: why we rejected MAU, per-project meters, and a Scale tier

Every pricing model we considered for Engram, why each looked attractive, and the specific failure mode that made us drop it — including the $499 Scale tier that took three months to admit was friction, not revenue.

March 9, 2026 · 16 min read

Engineering

Zero-downtime backfill migrations: the HMAC rollout in detail

How we migrated every API key from bcrypt to HMAC with zero downtime and zero revocations — opportunistic backfill on the verify path, a partial unique index, and a two-phase deploy.

February 2, 2026 · 16 min read

Engineering

Cookie scoping for cross-subdomain auth: the gotcha that bites everyone

A cookie set on api.lumetra.io should be visible on portal.lumetra.io but not on mcp.lumetra.io, and dev on localhost behaves nothing like prod. The rules, the OAuth dance, and the ten-line helper we landed on.

January 19, 2026 · 14 min read

Tutorial

Add Engram memory to ChatGPT as a custom connector

ChatGPT accepts custom MCP connectors on supported plans. Ten-minute walkthrough to point it at Engram, complete OAuth, and the system prompt that makes the model actually use memory.

May 7, 2026 · 11 min read

Benchmark

Reproducing the 91.6%: a step-by-step from the LongMemEval-S run

A direct follow-up to our 91.6% on LongMemEval-S: the exact stack, v44 composer prompt, profile schema, judge config, and retrieval knobs you need to verify the number end-to-end.

April 8, 2026 · 16 min read

Engineering

Designing the memories table for a system you can't easily migrate

An annotated walkthrough of Engram's memories table — every column, why it exists, what we'd have added on day one, and what each migration actually cost in production.

December 29, 2025 · 22 min read

Engineering

Building a 22-second deploy smoke that catches real bugs

A deploy smoke that always passes isn't a smoke — it's a status page. Ours runs 93 checks across REST, OAuth, MCP, and admin in 22 seconds, and caught 6 real bugs while we built it.

February 9, 2026 · 15 min read

Opinion

Hosted inference vs BYOK: the unit economics of agent memory

A memory product fans out 3-5 LLM calls per ingest and 2-3 per query. With math, this is why hosted inference can't price stably for agent memory — and why Engram is BYOK.

March 23, 2026 · 15 min read

Tutorial

Add Engram memory to Windsurf in three minutes

One config edit, a restart, and the system prompt that makes Windsurf actually call query_memory before answering — durable memory across sessions in three minutes.

May 4, 2026 · 9 min read

Engineering

The 200ms auth floor: replacing bcrypt with HMAC for API keys

Every authenticated request was paying ~200ms for a cost-12 bcrypt verify on a 256-bit random key. We measured it, swapped to HMAC-SHA256 with a server-side pepper, and shipped a zero-downtime migration.

January 26, 2026 · 12 min read

Benchmark

Engram on LongMemEval-S: 91.6%

458/500 on the public long-term-memory benchmark. Full methodology: server-side hybrid retrieval, a canonical user-profile pass (+4 points), our v44 composer prompt (published MIT). Plus the things that didn't work — critic-and-retry, better extraction, date pre-passes — and where the remaining 42 failures actually live.

April 6, 2026 · 15 min read

Guide

What is AI agent memory?

A practical, vendor-neutral guide to the category — what agent memory is, why stateless LLMs need it, which retrieval approaches exist, how to evaluate them, and how Engram fits in.

December 1, 2025 · 12 min read