Engineering
Memory agents, audit log, and rollback
Two things shipped today and they go together. A framework for background workers that maintain memory state on the tenant’s behalf, and full reversibility on every memory mutation: an audit log of every create, update, delete, and rollback, a 90-day recovery window, and point-in-time reads. Agents change things. The audit log keeps them honest.
The memory agents framework
Memory agents are scheduled background workers that maintain bucket state. They don’t sit on the request path; nothing in /v1/query waits for them. They wake up on a cron, do a bounded unit of work with the tenant’s BYOK provider, and write findings into a meta-bucket.
Every agent has the same shape. A template defines the agent’s identity: name, default schedule, default model, the operator function it runs, the parameters that operator accepts. An installation is a tenant’s copy of that template, optionally with overrides (different model, different schedule, different budget). A tick is one execution.
The framework owns the scheduling, the BYOK resolution, the budget enforcement, the audit-log write, the meta-bucket convention, and the dashboard surface. An operator writer only deals with the actual work: read inputs, emit findings. Everything else is taken care of by being inside the framework.
Five operators are in production today:
- Watchdog audits buckets for contradictions with newer entries, orphaned graph nodes, and suspicious near-duplicate clusters, and writes findings into
_watchdog_findings. - Logger records retrieval events into
_logger_retrievalsso cold and noisy buckets become visible. - Janitor surfaces dedupe candidates and low-value memories for review.
- Consolidator merges fragmentary memories about the same entity where it makes sense.
- Bucket Profiler generates the canonical entity summary the synthesizer reads on every query (the same profile pass we’ve written about before). This one used to live as a separate code path. Porting it in was the last move that made the framework cover all the background work we do.
Installation is one click from /agents in the dashboard, or one POST against the API. Each installation gets its own daily budget cap, its own per-tick output token cap, and a “Run now” button next to the schedule for one-off triggers. When an installation hits its daily cap, the next tick is skipped and the skip is logged with a reason. Same plumbing that protects the synthesis path protects the agents.
The audit log and the 90-day rollback window
Every memory mutation now writes an audit event. That covers user-issued writes from POST /v1/buckets/<id>/memories, deletes from the same endpoint, edits issued from the dashboard, and any write an agent performs on a tick. Each event carries the actor (user, agent, or system), the operation, the bucket and memory ids, the timestamp, and the parent tick id when one applies.
Underneath the events, we version the data. Before a memory is updated or deleted, its prior state gets archived into a versions table keyed by memory id and a valid-from / valid-to window. Rollback walks the relevant version row back into place. The default retention is 90 days; older versions are pruned by the same Janitor that handles meta-bucket retention.
Three things land in the customer’s lap from this:
- An audit feed. The dashboard
/historyview lists every change to a tenant’s memory, filterable by actor, bucket, memory, or tick. The same data is queryable viaGET /v1/audit. - One-click rollback. Every event in the feed has an undo button.
POST /v1/audit/<event_id>/rollbackreverts that one mutation. Cascading parents likebucket_deleteusePOST /v1/audit/groups/<parent_id>/rollbackto restore the parent and all its children in one call. For agent ticks, the dashboard’s “Undo this tick” button lists every event the tick produced (filtered bytick_id) and rolls them back together; if a Watchdog run scrubs ten memories the customer didn’t want scrubbed, that’s one click in the audit feed. - Point-in-time reads.
GET /v1/memories/<id>/at?as_of=<timestamp>returns what a memory looked like at a specific moment.GET /v1/buckets/<id>/historyreturns the ordered event log for a bucket. Useful for support, for compliance reviews, and for the obvious “what did this look like before the agent touched it” question.
The cost we paid for this was a per-write archive on update and delete. Inserts are unchanged. The hot path didn’t get slower in any measurable way; the archive write happens inside the same transaction as the mutation, so consistency is automatic.
How agents and the audit log work together
The combination is the part worth dwelling on. An agent tick produces some set of findings or mutations. Each one is audited individually and stamped with the tick’s parent id. From the customer’s side, that means three concrete affordances.
First, you can review what an agent did before deciding to keep it. A Janitor tick proposes deletions; the dashboard shows each one in the audit feed with a one-click undo before retention prunes the version rows. Second, you can roll back a whole tick at once if it misbehaved. A Watchdog tick that overflagged contradictions on a malformed bucket gets undone with a single call against the tick’s parent id. Third, you can trust new agents more aggressively because the cost of a bad run is bounded by the rollback window, not by the agent’s aggressiveness.
The framing we keep using internally: agents move fast, the audit log moves slow. Both have their job. Agents do the work; the audit log makes the work reversible.
Worked example: the Bucket Profiler
The Profiler is the most recent operator to join the framework, and the cleanest example of what the new surface looks like. Three endpoints cover the full lifecycle.
Install and trigger a first run:
curl -X POST https://api.lumetra.io/v1/buckets/<bucket_id>/profile/ensure-agent?run_now=true \
-H "Authorization: Bearer $ENGRAM_API_KEY" Force a refresh later:
curl -X POST https://api.lumetra.io/v1/buckets/<bucket_id>/profile/regenerate \
-H "Authorization: Bearer $ENGRAM_API_KEY"
Returns 412 with code BUCKET_PROFILER_NOT_INSTALLED if the agent isn’t installed yet; that’s the signal to call ensure-agent first.
Read the latest profile:
curl https://api.lumetra.io/v1/buckets/<bucket_id>/profile \
-H "Authorization: Bearer $ENGRAM_API_KEY" Response shape:
{
"bucket_id": "...",
"status": "ready", // ready | pending | not_installed
"updated_at": "2026-05-21T17:04:12Z",
"n_memories_at_gen": 487,
"profile": "{\"entities\": [...], ...}",
"source": "agent"
}
Behind that surface: each tick generates a profile, writes it as a memory in _bucket_profiles, and produces an audit event tagged with the tick id. If you don’t like a particular profile run (say the model went off-script), you roll back that specific event from /history. The version table holds the previous profile so the synthesizer reads the older one until the next scheduled tick produces a new one.
The dashboard /agents page is where to tune the per-installation knobs: model override, daily budget, per-tick output token cap (4096 by default), schedule (@every:1h is the default, manual means “only when I ask”), and the prompt itself. Most tenants change the model and the budget and leave the rest alone.
Why now
The framework matured to the point where the shape was obvious. When we first shipped Watchdog, the “framework” was a single scheduler loop and not much else. Each operator after that pushed on a different edge: budgets, multi-bucket reads, mutation proposals, meta-bucket conventions. By the time we ported the Profiler in, the framework was doing real work, and the cost of adding a new operator dropped to roughly “write the operator function.”
Reversibility followed a similar arc. We’ve had an audit trail of some kind since Engram’s early days, but the pre-this-release version was descriptive: it told you what happened, not how to undo it. As agents started writing more, “descriptive” stopped being enough. Either the customer trusts the agents fully and we accept that bad runs leak through, or we build the version table and the rollback path and accept the per-write archive cost. We picked the second one.
The two together get us a single mental model for everything that mutates memory: it’s an event in the audit log, and it’s reversible inside a 90-day window. Customers don’t need to think about which kind of change they’re looking at. The dashboard surfaces them all the same way.
What’s next
On the agents side, a few directions. The per-entity and per-topic profile operators already exist in the codebase; what’s missing is a shipped template that installs narrowly-scoped profilers on very large buckets where one monolithic profile loses resolution on individual entities. Auto-install of the Bucket Profiler on bucket creation, once we have a clearer read on the cost/value curve at the low end. And new operators against patterns we keep seeing in customer buckets (cross-bucket dedupe, scheduled summarization for export to other systems).
On the reversibility side, the obvious next thing is bucket-level point-in-time queries: not just “what did this one memory look like at time T” but “run this query against the bucket as it existed at time T.” The audit log and version tables already contain enough information to answer it; the work is in the retrieval pipeline doing the right thing with the historical view.
If you’re already running Engram, ensure-agent?run_now=true on any bucket installs the Profiler, the /agents page lets you tune any of the operators, and /history is where the audit feed lives. The 90-day window starts the moment the first auditable mutation lands.
Further reading
Closely related
- Designing the memories table for a system you can't easily migrate. Two years of column-by-column iteration on the spine of the system, with the painful migrations called out.
- Zero-downtime backfill migrations: the HMAC rollout in detail. Opportunistic backfill driven by the verify path, plus the partial unique index that made the rollout possible.
- Engram on LongMemEval-S: 91.6%. 458/500 on the public benchmark. Hybrid retrieval, canonical profile pass, v44 composer prompt (MIT-licensed).
Engram
- Engram on LongMemEval-S: 91.6%. Full benchmark methodology and what didn't work.
- Engram docs. HTTP API, MCP setup for each client, SDK examples.
- Start with Engram. Free tier, BYOK, MCP-native.