AI Memory. Solved.

Durable, explainable memory for AI agents over MCP.
Bring your own model. Works with Claude Code, Cursor, ChatGPT, and more.

Start Free With Engram See Quickstart

Benchmark LongMemEval-S

91.6% 458 / 500 · public benchmark · v44 prompt (MIT)

Read the writeup →

agent.ts MCP · engram.query_memory

1const result = await engram.query_memory({
2  question: "who is leading the payments migration?",
3  bucket:   "acme-eng",
4});
5
6// → synthesized answer + the graph facts that backed it
7{
8  answer: "Alice owns the payments migration; Sam and Priya pair on reconciliation.",
9  memories_found: 2,
10  graph_facts: [
11    { subject: "Alice", predicate: "owns", object: "payments migration" },
12    { subject: "Sam", predicate: "pairs_with", object: "Priya" },
13  ],
14}

Reliable memory

Agents remember conversations, facts, and decisions — and can show why results were chosen.

Easy setup

Connect via MCP or HTTP in minutes. Works with popular IDEs and agent runtimes.

Built-in controls

Manage retention and cleanup with simple tools — no extra plumbing required.

Get started in seconds

Connect Engram to your existing AI agents over MCP

Three steps to memory in your agent

Sign up. Free, no card. You'll land on a Getting Started page that walks the next two steps.
Add your LLM key. Engram is BYOK. Paste an OpenAI / Anthropic / Groq / Together / Fireworks key and we'll route every extraction and query call through your provider. You pay your provider directly. We never see your inference.
Paste the snippet below into your agent and restart it. Use Authorization: Bearer <api-key>with the API key from your portal.

View all integrations (44) →

Two ways to install; pick one.

Option A: Plugin marketplace (recommended)

One-line install. Adds the Engram MCP server plus /engram-remember and /engram-recall slash commands. No JSON file editing.

Add the Lumetra marketplace:

Claude Code

/plugin marketplace add lumetra-io/engram-claude-plugin

Install the plugin:

Claude Code

/plugin install engram@lumetra

Paste your API key when prompted. The plugin stores it in Claude Code's secret store and wires the MCP server automatically.

Option B: Manual CLI install

Skips the slash commands but wires the MCP server directly. Includes the required Authorization header.

Add via CLI:

Terminal

claude mcp add-json engram '{"type":"sse","url":"https://mcp.lumetra.io/mcp/sse","headers":{"Authorization":"Bearer <api-key>"}}'

Restart Claude Code and verify the server is connected.

Pricing

Memory pricing without inference markup.

Engram is bring-your-own-model. Free to start. No per-token charges, ever.

Free

$0forever

10K memories
50K retrievals / mo

Indie

$29/ mo

100K memories
500K retrievals / mo

Engram — FAQs

What is Engram, and how is it different from a vector database?

A vector database stores embeddings and returns the chunks closest in cosine space. Engram is the memory product on top of that.

It runs three retrieval engines in parallel (BM25 for exact lexical recall, vectors for paraphrase, a knowledge graph for relational queries), fuses them with reciprocal rank fusion, reranks with a cross-encoder, and returns an explanation trace alongside the answer.

A vector DB is one of the engines; Engram is the system you point your agent at.

What does “explainable memory” actually mean here?

Every recall comes back with a trace: which engines fired, what they matched on, the graph facts that contributed, the score each retrieved memory got, and the canonical profile the composer saw.

When the agent answers wrong, the trace tells you whether retrieval missed the right memory or the composer misread what it was given. "cosine: 0.87" is not a debug log entry. "BM25 matched on 'invoice'; graph linked it to the customer entity in the question" is.

How is Engram different from Mem0 or Zep?

Three places worth checking.

On pricing, a managed-inference vendor bills you for tokens your LLM provider already charged for; Engram is bring-your-own-model end-to-end and never sits in your inference invoice.

On surface area, Engram is MCP-native (config-file install on any compliant client) and exposes a REST API for everything else, so adding it to a new client is a few lines of config rather than an integration project.

On transparency, the composer prompt that produced our 91.6% on LongMemEval-S is published under MIT, the benchmark methodology is documented end to end, and you can reproduce the score on your own retrieval stack.

Compare like for like (the architecture, the prompt, the judge) and the differences become specific instead of vibes.

What does “bring your own model” actually mean for billing and data?

Three independent properties, and most "BYOK" claims only deliver one or two.

Credentials: your model key is encrypted at rest with AES-256-GCM under a master key the application database has no read access to.

Billing: your LLM provider invoices your account directly. We never sit in the dollar path or take a margin.

Inference path: our servers do orchestrate the calls, but we never log prompt bodies and never use customer content for training.

The full breakdown is in the BYOK post; the short version is that your inference bill stays on your provider's dashboard, where you already track it.

What latency should I expect?

Per operation, not per product.

A read-only authenticated call (e.g., listing buckets) lands under 50ms p95 once auth is HMAC-not-bcrypt. An MCP tool call with no LLM (list_memories, list_buckets) is under 200ms p95. A memory write with extraction is 1.5–2s p95, bounded by your extractor model, not by anything we control. A query with synthesis is 3–5s p95, bounded by your composer.

The retrieval itself (hybrid fusion across three engines, rerank) is single-digit percent of any query's wall time.

How does Engram handle duplicate or conflicting memories?

Dedup runs as a database invariant at write time: we hash normalized content and lean on a partial unique index, so duplicate INSERTs fail atomically rather than racing.

Paraphrases get caught by a secondary embedding check at 0.95 cosine, and an LLM conflict-resolver decides supersede, merge, or keep-both when content actually contradicts ("moved to Berlin" arriving after "lives in Lisbon").

A profile pass after ingest merges co-referent entities at the conversation level. "My college roommate's wedding" and "Emily's wedding in the city" collapse into one entry with both aliases recorded.

How is data isolated across users, projects, or agents?

Every memory carries four scoping axes: tenant_id, bucket_id, and optional agent_id and run_id.

Buckets are the user-facing namespace; make as many as you need, because they are never a pricing dimension. Multi-agent tenants can share a bucket without polluting each other's retrieval, because the agent and run filters live behind partial indexes that don't cost anything when nullable.

Cross-tenant isolation is enforced in every retrieval query as the leading filter, with foreign-key cascades on tenant deletion.

What happens when a user asks to be forgotten?

A single delete cascades through every derived index in one transaction:

BM25 inverted index
Vector store
Knowledge graph (including orphaned-node dereferencing)
Profile cache
Retrieval logs

The memory does not come back as a ghost recall a week later.

This is the durability promise most memory systems quietly fail (delete the row, miss one derived index, watch the memory resurface). We treat full cascade as table stakes, not a feature.

Do I have to use MCP, or can I integrate over a normal HTTP API?

Both.

The MCP layer at mcp.lumetra.io is the recommended path for clients that already speak MCP (Claude Code, Cursor, Windsurf, ChatGPT Connectors, OpenCode, OpenClaw), and it's the path with the fewest moving parts: a config-file install.

For anything else (your own Next.js or Python app, a CRON job, a custom orchestrator), there's a REST API at api.lumetra.io with the usual endpoints: /v1/buckets, /v1/buckets/{id}/memories, /v1/query, /v1/buckets/{id}/profile/regenerate. The MCP server is a thin wrapper around that REST API, not a separate product.

Can I self-host Engram?

Hosted is the default and what most customers run.

Enterprise customers can request VPC or on-prem deployment: same server, same encrypted-at-rest credential storage, same audit-log surface that runs in our cloud.

A self-serve self-host tier is on the roadmap. If you need on-prem before then, get in touch.

Sources

Primary references behind the benchmarks, protocol, and competitor claims you'll see across this site. Verified May 13, 2026.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory — Wu et al., 2024 (arXiv 2410.10813). The public benchmark we report 91.6% on.
Model Context Protocol specification — Open standard Engram implements; clients include Claude Code, Cursor, Windsurf, ChatGPT.
Mem0 (mem0ai/mem0) — Competitor referenced in pricing and FAQ comparisons.