Free
Hobby projects and evaluation. No credit card.
- 10K
- Stored memories
- 50K / mo
- Retrievals
Memories on Free accounts are cleared after 180 days of API inactivity (warning email sent at 150 days).
Start freeEngram is bring-your-own-model. Inference goes through your existing LLM contract — never ours. We meter the memory layer with generous limits on stored memories and retrievals; no per-token charges, ever.
Hobby projects and evaluation. No credit card.
Memories on Free accounts are cleared after 180 days of API inactivity (warning email sent at 150 days).
Start freeSolo builders shipping a real app.
Production teams scaling agent products.
Compliance, on-prem deployment, and managed inference.
The retrieval stack, the explainability surface, and the integration tooling are the same on every tier. Paid tiers buy more capacity and a higher support level — not access to features.
Inference is your LLM provider's price, not ours. Bundling it — the way most managed memory platforms do — just stacks a margin on top of what you already pay. We meter what we actually serve: the memory layer itself, with generous limits at every tier. Your inference contract stays with whoever you use today.
Engram Team vs. Mem0 Pro at the tier most production teams land on. Both are managed; the meters and what's bundled differ.
| Engram Team | Mem0 Pro | |
|---|---|---|
| Monthly price | $99 | $249 |
| Stored memories | 1,000,000 | — |
| Add / extraction calls | Unmetered within storage cap | 500,000 |
| Retrievals / queries | 5,000,000 | 50,000 |
| Concurrent requests | 32 per tenant | Not published |
| LLM inference | Bring your own (billed by your provider) | Bundled in the plan price |
| Knowledge graph | First-class, queryable, explainable | Available |
| Explanation trace per recall | Yes | Not surfaced |
| MCP server | Native in every tier (same endpoint, same auth) | Separate product (OpenMemory) |
Source: mem0.ai/pricing · verified May 13, 2026. Competitor pricing changes; tell us if this is stale.
Memory and inference are different jobs with different cost shapes. Inference scales with token volume; memory scales with stored items and retrievals.
Bundling them (the default at most managed memory platforms) means you pay a margin on tokens your LLM provider already charged you for, hidden inside a per-query rate that's hard to inspect.
Splitting them gives you a direct invoice from your model provider for inference, where your existing volume discounts and FinOps tooling already live, and a flat-shape invoice from us for the memory layer, where the numbers are predictable and easy to reason about.
One retrieval is one query: typically a single MCP query_memory call or POST /v1/query.
The 5:1 ratio between retrievals and stored memories isn't arbitrary. It's the median read-to-write multiplier we measured in our private beta. A bucket with 100K memories typically sees 500K–1M retrievals over a month, and 5:1 lets a Team plan comfortably cover the median case without forcing read-heavy customers to over-buy storage.
If your traffic genuinely runs 8–10× retrievals per memory (common for autonomous orchestrators), get in touch. The ratio is a default, not a hard ceiling.
One atomic fact written into Engram, after dedup.
Two dedup paths run at write time: a hash-equality check on normalized content (so verbatim restatements never double-count) and a paraphrase check at 0.95 cosine for content that's different bytes but the same fact.
The result is that the counter grows with the breadth of what your agent has learned, not with how chatty it is. An agent that hears "I prefer dark mode" twenty times across a year of conversations adds one memory, not twenty.
Writes and queries against an exhausted cap return HTTP 402 with a clear message: "Memory cap reached (used/cap). Upgrade your plan or delete memories to continue."
Your existing memories stay intact; only new writes or further queries are blocked until you upgrade or free space.
We picked the loud-failure mode on purpose, because a silent throttle leaves your agent timing out without a useful reason. Email warnings at 80%, soft-throttling, and rate-limit headers are on the roadmap; the current behavior is fail-loud-with-context.
On Enterprise, yes. We can run extraction and synthesis with a managed model at cost plus a small margin, and the inference cost lands on the same invoice as the platform fee.
On Free, Indie, and Team, BYOM is required.
Keeping the public tiers BYOM-only is the discipline that lets us print transparent prices. Once a vendor is running inference for customers, the math stops working without a markup, and the markup gets hidden in a per-token rate that's hard to evaluate against published provider pricing.
The honest answer follows the usual managed-vs-self-host curve: cheaper at small scale, comparable in the middle, and worse than self-hosting at very large scale. The break-even depends on your traffic shape and how much engineering capacity you have to throw at it.
What you're paying us for is hybrid retrieval (BM25 + vector + graph, fused and reranked) that already works, cascade-delete and write-time dedup that already work, a canonical profile pass that already works, and roughly two years of edge cases in the ingest and composer prompts that already work.
If your team is large enough that building all of that is cheaper than buying it, we'll point you at the public artifacts: the v44 composer prompt is MIT, the LongMemEval methodology is documented, and the architecture posts spell out the rest.
Self-hosting and VPC live on Enterprise only — not Free / Indie / Team. Enterprise is priced annually with a custom commitment that covers the support engineer, the SLA, the DPA, and any managed-inference credits, separately from the deployment artifact itself.
The lower tiers stay hosted because the unit economics of supporting on-prem under a $99/mo SKU don't work for either side. If you have a procurement requirement that rules out hosted (regulated industry, data-residency, air-gap), talk to us — that is exactly the path Enterprise is shaped for.
For the product-side answer on what self-hosting includes (binary parity, audit log, credential storage), see the "Can I self-host Engram?" entry on the homepage FAQ.
No. The Free tier requires only an account and gives you 10,000 stored memories and 50,000 retrievals per month, enough to run a real workload against the production retrieval stack for several weeks.
The architecture on Free is identical to what paying tiers get; the only difference is the monthly cap.