Memory pricing without inference markup.

Engram is bring-your-own-model. Inference goes through your existing LLM contract — never ours. We meter the memory layer with generous limits on stored memories and retrievals; no per-token charges, ever.

Free

Hobby projects and evaluation. No credit card.

$0 forever
10K
Stored memories
50K / mo
Retrievals

Good for Prototyping an agent, testing the API, LongMemEval reproductions.

Throughput 8 concurrent requests

Support Community (GitHub + forum)

Memories on Free accounts are cleared after 180 days of API inactivity (warning email sent at 150 days).

Start free

Indie

Solo builders shipping a real app.

$29 per month
100K
Stored memories
500K / mo
Retrievals

Good for Side-project agents, personal AI tools, one production tenant.

Throughput 16 concurrent requests

Support Email

Start Indie

Enterprise

Compliance, on-prem deployment, and managed inference.

Custom annual
Unlimited
Stored memories
Unlimited
Retrievals

Good for Regulated industries, cross-team agent rollouts, vendor consolidation.

Throughput 128 concurrent requests

Support Dedicated support engineer

  • Managed inference option (we run the LLM)
  • On-prem or VPC deployment
  • Custom retention + DPA
  • SLA with credits
Talk to sales

Every plan includes

The retrieval stack, the explainability surface, and the integration tooling are the same on every tier. Paid tiers buy more capacity and a higher support level — not access to features.

Why bring your own model?

Inference is your LLM provider's price, not ours. Bundling it — the way most managed memory platforms do — just stacks a margin on top of what you already pay. We meter what we actually serve: the memory layer itself, with generous limits at every tier. Your inference contract stays with whoever you use today.

How we compare

Engram Team vs. Mem0 Pro at the tier most production teams land on. Both are managed; the meters and what's bundled differ.

Engram Team tier compared to Mem0 Pro tier
  Engram Team Mem0 Pro
Monthly price $99 $249
Stored memories 1,000,000
Add / extraction calls Unmetered within storage cap 500,000
Retrievals / queries 5,000,000 50,000
Concurrent requests 32 per tenant Not published
LLM inference Bring your own (billed by your provider) Bundled in the plan price
Knowledge graph First-class, queryable, explainable Available
Explanation trace per recall Yes Not surfaced
MCP server Native in every tier (same endpoint, same auth) Separate product (OpenMemory)

Source: mem0.ai/pricing · verified May 13, 2026. Competitor pricing changes; tell us if this is stale.

Pricing FAQs

Why is Engram bring-your-own-model?

Memory and inference are different jobs with different cost shapes. Inference scales with token volume; memory scales with stored items and retrievals.

Bundling them (the default at most managed memory platforms) means you pay a margin on tokens your LLM provider already charged you for, hidden inside a per-query rate that's hard to inspect.

Splitting them gives you a direct invoice from your model provider for inference, where your existing volume discounts and FinOps tooling already live, and a flat-shape invoice from us for the memory layer, where the numbers are predictable and easy to reason about.

How are retrievals counted, and why 5× memories?

One retrieval is one query: typically a single MCP query_memory call or POST /v1/query.

The 5:1 ratio between retrievals and stored memories isn't arbitrary. It's the median read-to-write multiplier we measured in our private beta. A bucket with 100K memories typically sees 500K–1M retrievals over a month, and 5:1 lets a Team plan comfortably cover the median case without forcing read-heavy customers to over-buy storage.

If your traffic genuinely runs 8–10× retrievals per memory (common for autonomous orchestrators), get in touch. The ratio is a default, not a hard ceiling.

What counts as a "stored memory" against my cap?

One atomic fact written into Engram, after dedup.

Two dedup paths run at write time: a hash-equality check on normalized content (so verbatim restatements never double-count) and a paraphrase check at 0.95 cosine for content that's different bytes but the same fact.

The result is that the counter grows with the breadth of what your agent has learned, not with how chatty it is. An agent that hears "I prefer dark mode" twenty times across a year of conversations adds one memory, not twenty.

What happens if I hit my memory or retrieval cap?

Writes and queries against an exhausted cap return HTTP 402 with a clear message: "Memory cap reached (used/cap). Upgrade your plan or delete memories to continue."

Your existing memories stay intact; only new writes or further queries are blocked until you upgrade or free space.

We picked the loud-failure mode on purpose, because a silent throttle leaves your agent timing out without a useful reason. Email warnings at 80%, soft-throttling, and rate-limit headers are on the roadmap; the current behavior is fail-loud-with-context.

Can you run the LLM for me, e.g., if I don't want to manage a key?

On Enterprise, yes. We can run extraction and synthesis with a managed model at cost plus a small margin, and the inference cost lands on the same invoice as the platform fee.

On Free, Indie, and Team, BYOM is required.

Keeping the public tiers BYOM-only is the discipline that lets us print transparent prices. Once a vendor is running inference for customers, the math stops working without a markup, and the markup gets hidden in a per-token rate that's hard to evaluate against published provider pricing.

How does pricing compare to running your own memory stack?

The honest answer follows the usual managed-vs-self-host curve: cheaper at small scale, comparable in the middle, and worse than self-hosting at very large scale. The break-even depends on your traffic shape and how much engineering capacity you have to throw at it.

What you're paying us for is hybrid retrieval (BM25 + vector + graph, fused and reranked) that already works, cascade-delete and write-time dedup that already work, a canonical profile pass that already works, and roughly two years of edge cases in the ingest and composer prompts that already work.

If your team is large enough that building all of that is cheaper than buying it, we'll point you at the public artifacts: the v44 composer prompt is MIT, the LongMemEval methodology is documented, and the architecture posts spell out the rest.

Which plan unlocks self-hosting or VPC deployment, and how is it priced?

Self-hosting and VPC live on Enterprise only — not Free / Indie / Team. Enterprise is priced annually with a custom commitment that covers the support engineer, the SLA, the DPA, and any managed-inference credits, separately from the deployment artifact itself.

The lower tiers stay hosted because the unit economics of supporting on-prem under a $99/mo SKU don't work for either side. If you have a procurement requirement that rules out hosted (regulated industry, data-residency, air-gap), talk to us — that is exactly the path Enterprise is shaped for.

For the product-side answer on what self-hosting includes (binary parity, audit log, credential storage), see the "Can I self-host Engram?" entry on the homepage FAQ.

Do I need a credit card to try Engram?

No. The Free tier requires only an account and gives you 10,000 stored memories and 50,000 retrievals per month, enough to run a real workload against the production retrieval stack for several weeks.

The architecture on Free is identical to what paying tiers get; the only difference is the monthly cap.