Engineering
Designing the memories table for a system you can't easily migrate
Memory tables are the hardest table in the system to evolve. The retrieval path is the most uncertain piece of the architecture, so the table that backs it grows columns the longest. Postpone a column and you can't backfill it cleanly. Add it speculatively and you carry the dead weight for years. This post is an annotated walkthrough of where we landed.
Engram's memories table is the spine of the system. Every fact a tenant has stored (every preference, every "the meeting moved to Thursday," every entity referenced in a knowledge-graph edge) has a row in this table. Embeddings hang off it, BM25 hangs off it, the graph references it by foreign key, and the recency-boost scorer reads its timestamps on every retrieval. Get it wrong on day one and the cost of fixing it scales with adoption.
Two years of iteration. Eleven columns added since the original. Zero removed. Every column we have is one of three things: load-bearing for retrieval, load-bearing for billing or governance, or a hint we wish we hadn't promised an API for. We'll be specific about which is which.
This post walks the table column by column. For each, what it does, when it landed, what we wish we'd done differently, what it cost us to add. At the end is the actual Postgres CREATE TABLE statement we ship today, annotated inline.
Why memory tables are hard to evolve
A normal application table — users, orders, sessions — has a clear ownership model. The product team carries a mental model of what a "user" is, you write down the columns that model demands, and those columns stay roughly stable until the product changes shape. Lookups happen by primary key or by a small number of indexed columns. Nothing about the schema is surprising six months in.
A memories table doesn't work that way, because retrieval is the product. The columns aren't there to describe an entity to a human reader; they're there because some retrieval path inside the system needs them, and the retrieval path is itself an open research problem you're still iterating on. The first version of the table has content and an embedding and that's it. Then BM25 turns out to help on keyword precision, and now you need lemmatized text alongside the raw. Then you discover scope-sensitive questions ("which thermostat do I currently have" vs. "which thermostats have I ever owned") and you need a classification column written at ingest. A customer asks for multi-agent isolation and you need a new scoping axis on the row. Every time the retrieval path learns something new, the table grows a column.
The asymmetry between adding and removing columns is brutal. Adding one you'll need later is cheap at small scale and effectively impossible at large scale. Removing one that's wired into retrieval, billing, the API surface, and three internal dashboards is essentially never worth the engineering cost. So you end up conservative about adding columns, more conservative about exposing them through the API, and committed to treating anything you ship as permanent — because in practice it is.
One more thing pressing on the design. We run Postgres in production and SQLite for local dev and small self-hosted deployments, which means every new column has to land in both schemas plus the runtime migration list, and a smoke test has to confirm the two agree. The clean version of that work takes about fifteen minutes per column. The messy version takes a half day. We'll come back to that one.
Identity columns: id, tenant_id, bucket_id
id is a UUID via uuid_generate_v4(). We started on bigserial and switched around month three when self-hosted deployments needed to merge data from multiple environments without primary-key collisions. tenant_id is a foreign key to tenants(id) with ON DELETE CASCADE; it's the row-level partitioning key, every index that matters leads with it, and the cascade is load-bearing. A deleted internal test tenant once left 400k orphan embeddings in our pgvector index because the cascade hadn't been wired up yet.
bucket_id is the one worth dwelling on. A bucket is the user-facing namespace inside a tenant: a project, an agent, a separate user's memory space. The naive design is to derive bucket from a join (tenant has many buckets, bucket has many memories), and we did exactly that for the first month. We moved bucket_id onto the memory row because every retrieval query has to filter by bucket and the planner kept choosing bad join orders under load. Denormalizing dropped p95 retrieval from 340ms to 180ms with no other changes.
The cost: bucket renames are still fine (the FK is to bucket_id, not name), but if you ever want to "merge two buckets" you're rewriting a lot of rows. We've shipped a merge endpoint exactly once and it was painful. If you're starting from scratch, put your scoping key on the row even if normalization theory says you shouldn't. The access pattern wins.
Content columns: content, token_count, metadata
content is the raw text of the memory, TEXT NOT NULL. The "why store raw text when you chunk it elsewhere" question comes up every time someone new reviews the schema. We don't chunk memories at storage time. We chunk source documents at ingest, then a small LLM call distills each chunk into one or a few atomic memory statements, and the statements are what land in this table. Each row is already a self-contained fact. The chunks themselves are stored separately for provenance, but retrieval doesn't read them.
token_count is an integer set at ingest, used for billing and budget-aware retrieval. Retrieval limits results by token budget, not row count, so the composer always sees roughly the same amount of context regardless of how verbose the underlying memories are. One tokenization per row for life. Almost free.
metadata is a JSONB column with default '{}'. Our escape hatch. Today it carries a handful of debug fields (the model name that produced the classification, the extractor version, occasional A/B flags). We're nervous about it. There's no index, so we have to remember not to query it on the hot path, and we've twice promoted a metadata field to a real column when we noticed a customer querying it through a dashboard and getting sequential scans. Treat the JSONB column as a parking lot, not a destination.
Embedding: vector(384)
The embedding column is vector(384) under Postgres (pgvector) and a JSON-encoded TEXT column under SQLite (which has no native vector type; we decode to a numpy array at query time and do exact search). The model is all-MiniLM-L6-v2.
The dimension choice is more interesting than the model choice. We tested 768- and 1024-dimensional models and got marginal recall improvements (around 1.5–2 points on our internal retrieval set) at roughly 2–3x the index size and noticeably slower HNSW build times. For a system where the embedding is one of three retrieval signals (vector, BM25, graph, fused via RRF), the marginal embedding quality matters less than it would in a vector-only system. Cheaper, faster, denser embeddings let us run vector search across larger candidate sets at the same latency budget, which buys back more recall than the per-vector quality would.
Migration cost: high. Switching embedding models means re-embedding every memory in the system. We've come close twice and pulled back both times. Pick one you'll be happy with at 10x scale.
Classification: classified_topic, classified_scope
Two columns set at ingest by a small LLM call. classified_topic is a short topic slug (e.g. living-room-sound-system) and classified_scope is one of specific or general.
Scope is the one that earns its keep. A "specific" memory is about a concrete entity with a lifetime ("the Sonos Beam in the living room"); a "general" memory is a preference or tendency ("I prefer warm-toned audio gear"). At retrieval time, scope tells the composer how to interpret a fact. If a question asks "what speaker do I currently have" and we retrieve both a specific memory ("bought a Beam in March") and a general one ("I like warm audio"), the composer should weight the specific much higher and use the general only as background. Without the scope label, the LLM was noticeably worse at preference-vs-fact questions.
Topic has been less load-bearing. It started as a "let's group related memories visually in the admin UI" feature and crept into retrieval as a soft filter. Today it's used by one rerank stage and one debugging dashboard. If we were starting over, we'd probably ship scope and not topic.
The cost is at ingest, not storage: every new memory pays for one small LLM call. We've batched and cached aggressively, but it's still real money on the long tail of low-traffic tenants where the per-memory inference cost dominates the per-tenant revenue. Making classification optional behind a tenant flag is on the table.
Hints: hint_root, hint_tags
User-provided context that doesn't fit the structured columns. hint_root is a filesystem-path-shaped string, added in month five for a customer who wanted to associate memories with code paths in their repo. hint_tags is a TEXT array for arbitrary user labels. Tags get used. Roots, less so; exactly one customer's workload depends on hint_root today, and they'd be fine on tags, but cutting it means an API deprecation cycle and a backfill, and we haven't made the time. Hint fields are easy to add and forever to remove. If you're not certain a field will be queried on the hot path, push it into the metadata JSONB.
BM25 prep: text_lemmatized
text_lemmatized is a Python-lemmatized copy of the content, used as input to a Postgres tsvector for BM25 search. A generated column, lemmatized_tsv, sits on top of it (tsvector GENERATED ALWAYS AS to_tsvector('simple', coalesce(text_lemmatized, ''))) with a GIN index. BM25 hits the tsvector.
Two questions here. First, why lemmatize at ingest instead of letting Postgres do it at query time? Postgres has built-in stemming, but it's English-default and aggressive: it stems universe and university together, which is wrong for memory retrieval where those are different topics. We use Python (spaCy), which is slower but produces results we control. Pre-computing at ingest means we lemmatize each memory exactly once, then BM25 queries hit a precomputed tsvector. Lemmatize-at-query would force per-query tokenization across the whole table.
Second, why 'simple' config on the tsvector? Because the input is already lemmatized, so Postgres-side stemming would over-stem. The simple config does no stemming, just tokenization. Two stemmers stacked is one too many.
This is the column we most wish we'd added on day one. We added it in v3 of the schema after we'd already accumulated 8 million rows. Backfilling meant pulling each row, running it through spaCy in batches, and writing it back. We did it online over a weekend; it cost a noticeable degradation in ingest latency the whole time (we throttled the backfill to avoid starving live writes) and one rollback when we discovered our lemmatizer was choking on emoji and writing nulls. If we'd had this column from day one, the only cost would have been the additional storage.
Dedup: text_hash
text_hash is a 32-character MD5 hash of normalized text. It backs a partial unique index:
CREATE UNIQUE INDEX idx_memories_hash_unique
ON memories(tenant_id, bucket_id, text_hash)
WHERE text_hash IS NOT NULL; This is the cheapest dedup we've found. At ingest time, we hash the normalized text and try to insert. If a duplicate exists, Postgres raises a unique-violation and we catch it and return the existing row's id instead. No pre-query, no race condition, no double-write. The partial index (only rows where text_hash is non-null) keeps the index small and means we can backfill the column gradually without breaking existing rows that don't have a hash yet.
The normalization step matters. Two memories that say "I prefer dark roast coffee" and "I prefer dark-roast coffee" should dedup; raw text hashing won't catch that. We lowercase, collapse whitespace, strip punctuation, and hash the result. We do not lemmatize for hashing; lemmatization is lossy in a way that conflates real distinctions (singular vs plural matters for some facts).
Adding this index was the single biggest reduction in operational pain we've ever made to this table. Before it, dedup was a query-then-insert at the application layer, which raced under load and routinely produced near-duplicate memories that the retrieval path then had to deduplicate at query time. After it, ingest is one round trip, deduplication is a database invariant, and retrieval gets clean rows. We added it in month nine; we should have added it in month one.
Attribution: attributed_to, agent_id, run_id, source_actor_id
The multi-agent scoping story. attributed_to is 'user' or 'assistant': who said the thing the memory captures. agent_id, run_id, and source_actor_id form the rest of a scoping triple beyond tenant_id + bucket_id.
The need surfaced when multi-agent customers started shipping. Inside one tenant, a planning agent and an execution agent might both write into the same bucket. Without attribution, the planning agent's hypothetical ("if we choose option A, the cost is $X") looks identical in retrieval to a user statement ("we chose option A, the cost is $X"), and the composer can't disambiguate.
attributed_to landed first, in month seven. One byte of meaningful information per row, and it solved most of the multi-agent confusion. agent_id and run_id came around month eighteen when customers wanted to scope retrieval to "memories from this agent only" or "memories from this run only" without rebuilding the bucket model. Nullable columns with partial indexes, same access-pattern reasoning as bucket_id: this is going to be in the WHERE clause of every query for these customers, and a join would be wrong. source_actor_id is a free-form string that pairs with our audit log, identifying the specific tool, service account, or human user inside a tenant that wrote the memory. Used for governance and a small trust boost in retrieval.
attributed_to is one of the three columns we wish we'd had from day one. The backfill had to guess attribution from the structure of each ingest payload, and a non-trivial fraction of old rows landed on null because we couldn't reconstruct who said what.
Cross-memory edges: linked_memory_ids
linked_memory_ids is a JSONB array of UUIDs with a GIN index for membership queries. The motivating case: a memory like "we decided to move the launch to Q3" is meaningfully related to an earlier "we considered moving the launch to Q3 or Q4." The triples-based knowledge graph captures the entities (launch, Q3, Q4) but not the discourse relationship (this decision supersedes that consideration). The linked array records explicit supersedes / refines / cites edges. We're mildly skeptical of it; exactly one feature reads it today, and the post-ingest pass that populates it is expensive enough that we don't run it on every memory. If we were starting over, this would have stayed in metadata until a second feature needed it.
Lifecycle: created_at, last_accessed_at
created_at on insert; last_accessed_at bumped every time retrieval surfaces the memory. Those two timestamps are the entire lifecycle story. Memories never auto-expire and we never delete them without an explicit instruction from the agent or the user — the column we used to carry for that, a decayed boolean and the background job that flipped it, got removed after we watched too many customers lose useful long-tail facts they hadn't touched in a few months but still wanted answers from.
The timestamps still earn their keep on the read path. The hybrid retrieval scorer applies a recency boost (exponential, ~365-day half-life) that lets newer facts win when older ones are stale without removing the older ones from the index at all. Supersession marks an older statement as historical when a newer one contradicts it; the older row stays queryable for audit and for the case where a customer wants to recover the prior state.
We worried about last_accessed_at being a write on every retrieval. In practice it's a single batched UPDATE … WHERE id = ANY(?) on the 5–20 rows actually returned, small enough not to show up in our latency budget.
Columns we wish we'd added on day one
Three columns we'd add from the start if we were building this again:
- text_lemmatized. Added in v3, required backfilling 8 million rows over a weekend, briefly degraded ingest latency, caused one rollback when our lemmatizer choked on emoji. None of that pain would have existed if the column had been there from the first commit.
- text_hash + the partial unique index. Application-level dedup is a race condition farm. Database-level dedup via a unique index is a one-line invariant. The index let us remove three retry loops, two reconciliation jobs, and one heuristic deduplicator from retrieval, all by leaning on this one constraint.
- attributed_to. Without it, we couldn't tell whose facts were whose in multi-agent contexts. The backfill produced a non-trivial fraction of nulls we've never been able to recover. Anyone building memory for an agent context in 2026 should add this column on day one.
Columns we added under pressure and might cut
classified_topic probably doesn't pay for itself. Scope does; topic crept in as a UI feature and now exists for one rerank stage and one dashboard. Both classification columns cost an LLM call at ingest, which on long-tail tenants eats the margin. Best case for both: optional behind a tenant flag, default-off for cost-sensitive deployments.
hint_root has one customer using it and would be fine as a tag. The reason it's still here is the deprecation cycle, not the design.
Zero columns removed in two years. Not because we picked perfectly. Because removal is hard enough that even columns we don't love survive.
The Postgres / SQLite parity tax
A memories table that works under both Postgres and SQLite is harder than one that works under either alone. We support SQLite for local development, for self-hosted small deployments, and for the test suite. The differences add up:
- Types.
vector(384)doesn't exist in SQLite, so the SQLite schema uses TEXT and we JSON-encode the embedding.JSONBdoesn't exist either, so it becomes TEXT with json-encoded payloads.TIMESTAMPTZis TEXT (ISO 8601 strings).TEXT[]for tags is also JSON-encoded. - Vector search. No HNSW, no IVFFLAT. The SQLite path decodes embeddings into numpy arrays and does an exact cosine-similarity scan. This is fine for <100k rows and a non-starter past that, which we've made peace with: SQLite is dev and small-self-host, Postgres is anything bigger.
- Generated columns. Postgres'
GENERATED ALWAYS AS … STOREDtsvector trick doesn't have a clean SQLite analog; instead the SQLite path computes the BM25 index input at write time. - Partial unique indexes. SQLite supports these but with different syntax around
WHEREclauses. We translate at runtime.
We use a translating cursor for local dev: a thin wrapper that intercepts Postgres-flavored SQL and rewrites it for SQLite. New columns need to land in both schemas plus the runtime migration list. The smoke test (a Python script that opens both backends, creates the schema, and compares the column lists) caught three drift bugs the day we tried to skip running it on a "small" migration. The "small" migration was a single new column. The drift bugs were all in the index definitions, not the column itself. Write the smoke test before you write the second schema, not after.
Index design
The partial unique index on (tenant_id, bucket_id, text_hash) WHERE text_hash IS NOT NULL is the one with the actual story. The WHERE clause matters: it keeps the index small (only hashed rows) and let us backfill text_hash across 8 million existing rows without taking writes offline or growing an index over un-hashable nulls. Before this index existed, dedup was a query-then-insert at the application layer, which raced under load and produced near-duplicates that retrieval had to clean up at query time. Turning dedup into a database invariant via a unique-violation catch let us delete three retry loops, two reconciliation jobs, and one heuristic deduplicator. The pre-existing duplicates (about 4% of all memories at the time) had to be cleaned up before the unique index would build, which itself took a Saturday afternoon.
The GIN index on lemmatized_tsv makes Postgres BM25 usable. Without it, full-text queries fall back to sequential scan and die past about 50k rows per tenant. The pgvector HNSW on embedding gets us under 200ms p95 retrieval at the upper end of our largest tenant; tuning m and ef_construction took a week of benchmarking against synthetic load shaped like our real query distribution. The defaults were too aggressive on memory and too loose on recall. We also added a background reindex job because HNSW degrades over time as the inserted-vs-built distribution shifts.
The other indexes (tenant, bucket, last_accessed, attributed, agent, run, linked) are smaller and supporting. We add them when we see a slow query in the logs, not speculatively. We've never regretted adding one. We've occasionally regretted not removing one when the workload changed; orphan indexes still cost write throughput — the decayed partial index we used to keep was one of those, and went away with the column.
What we'd put off
When you're tempted to add a column, ask what specific query is going to filter or sort on it. If you can't name one in the hot path, the field belongs in metadata. The metadata-to-column migration is easy. Removing a column you wish you'd never added is not.
The actual CREATE TABLE
Below is the Postgres schema we ship, with inline comments. The SQLite version is structurally identical with type translations; the runtime migration list covers any column that arrived after the original create.
CREATE TABLE IF NOT EXISTS memories (
-- Identity. UUID for cross-environment portability; tenant + bucket are
-- denormalized onto the row because every query filters on both, and the
-- planner chose poor join orders when bucket lived only on the parent.
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
bucket_id UUID NOT NULL REFERENCES buckets(id) ON DELETE CASCADE,
-- Content. Stored raw because retrieval results are returned verbatim
-- to the composer; no chunk-index indirection. token_count is set at
-- ingest for billing and budget-aware retrieval.
content TEXT NOT NULL,
embedding vector(384), -- all-MiniLM-L6-v2 dimension
metadata JSONB DEFAULT '{}', -- escape hatch; not on the hot path
token_count INTEGER,
-- Classification, set at ingest by a small LLM call.
-- scope: 'specific' | 'general' (load-bearing for retrieval)
-- topic: short slug (lighter; used by rerank + admin UI)
classified_topic TEXT,
classified_scope TEXT,
-- User-provided hints. tags is queried; hint_root is legacy and would
-- be cut if the deprecation cycle were free.
hint_root TEXT,
hint_tags TEXT[],
-- Lifecycle. Memories never auto-expire; deletion is explicit only.
-- The recency-boost scorer reads these timestamps on every retrieval.
created_at TIMESTAMPTZ DEFAULT NOW(),
last_accessed_at TIMESTAMPTZ,
-- BM25 prep. Lemmatized at ingest (Python, not Postgres) so we control
-- the stemming. The generated tsvector below sits on top of this column.
text_lemmatized TEXT,
-- Dedup. MD5 of normalized text. Paired with a partial unique index
-- on (tenant_id, bucket_id, text_hash) for pre-insert dedup as a
-- database invariant rather than an application race.
text_hash VARCHAR(32),
-- Attribution. 'user' | 'assistant'; the cheapest column on this table
-- and one we wish we'd had on day one for multi-agent contexts.
attributed_to VARCHAR(20),
-- Three-axis scoping (alongside tenant_id + bucket_id). Nullable;
-- supported by partial indexes. Lets multi-agent tenants share a
-- bucket without polluting each other's retrieval.
agent_id TEXT,
run_id TEXT,
-- Cross-memory references. JSONB array of memory UUIDs with a GIN
-- index. Captures explicit edges (supersedes / refines / cites) that
-- the triples-based knowledge graph doesn't.
linked_memory_ids JSONB DEFAULT '[]'::jsonb,
-- Free-form actor id, paired with the audit log. Identifies the
-- specific tool / service / human inside a tenant that wrote this row.
source_actor_id TEXT
);
-- Generated tsvector over the lemmatized column. 'simple' config because
-- the input is already lemmatized; Postgres-side stemming would over-stem.
ALTER TABLE memories
ADD COLUMN IF NOT EXISTS lemmatized_tsv tsvector
GENERATED ALWAYS AS (to_tsvector('simple', coalesce(text_lemmatized, ''))) STORED;
-- Hot-path indexes. The partial unique on text_hash is the dedup invariant;
-- the GIN on lemmatized_tsv makes BM25 usable; the HNSW on embedding (created
-- separately, not shown) is the vector path.
CREATE INDEX IF NOT EXISTS idx_memories_tenant ON memories(tenant_id);
CREATE INDEX IF NOT EXISTS idx_memories_bucket ON memories(bucket_id);
CREATE INDEX IF NOT EXISTS idx_memories_tenant_bucket ON memories(tenant_id, bucket_id);
CREATE INDEX IF NOT EXISTS idx_memories_last_accessed ON memories(tenant_id, last_accessed_at);
CREATE INDEX IF NOT EXISTS idx_memories_lemmatized_tsv ON memories USING GIN(lemmatized_tsv);
CREATE UNIQUE INDEX IF NOT EXISTS idx_memories_hash_unique
ON memories(tenant_id, bucket_id, text_hash) WHERE text_hash IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_memories_attributed
ON memories(tenant_id, bucket_id, attributed_to) WHERE attributed_to IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_memories_agent
ON memories(tenant_id, agent_id) WHERE agent_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_memories_run
ON memories(tenant_id, run_id) WHERE run_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_memories_linked
ON memories USING GIN(linked_memory_ids); That's the schema. Two years of production traffic and a handful of painful migrations to get here. Steal whatever's useful.
Further reading
Closely related
- Why text_hash beats embedding-based dedup for agent memory. Hash-then-embedding dedup as a database invariant, plus the canonicalization choices that made it work.
- Zero-downtime backfill migrations: the HMAC rollout in detail. Opportunistic backfill driven by the verify path, plus the partial unique index that made the rollout possible.
- When pgvector slowed down past 500 buckets per tenant. A 3× p95 regression that looked like pgvector and was actually Python orchestration. Profile before you tune.
Engram
- Engram on LongMemEval-S: 91.6%. Full benchmark methodology and what didn't work.
- Engram docs. HTTP API, MCP setup for each client, SDK examples.
- Start with Engram. Free tier, BYOK, MCP-native.