Memory pricing without inference markup.

Engram is bring-your-own-model. Inference goes through your existing LLM contract — never ours. We meter the memory layer with generous limits on stored memories and retrievals; no per-token charges, ever.

Free

Hobby projects and evaluation. No credit card.

$0 forever

10K
Stored memories: 50K / mo
Retrievals

Good for Prototyping an agent, testing the API, LongMemEval reproductions.

Throughput 8 concurrent requests

Support Community (GitHub + forum)

Memories on Free accounts are cleared after 180 days of API inactivity (warning email sent at 150 days).

Start free

Indie

Solo builders shipping a real app.

$29 per month

100K
Stored memories: 500K / mo
Retrievals

Good for Side-project agents, personal AI tools, one production tenant.

Throughput 16 concurrent requests

Support Email

Start Indie

Team

Production teams scaling agent products.

$99 per month

1M
Stored memories: 5M / mo
Retrievals

Good for Multi-tenant SaaS, production agent fleets, multi-user products.

Throughput 32 concurrent requests

Support Priority email

Start Team

Enterprise

Compliance, on-prem deployment, and managed inference.

Custom annual

Unlimited
Stored memories: Unlimited
Retrievals

Good for Regulated industries, cross-team agent rollouts, vendor consolidation.

Throughput 128 concurrent requests

Support Dedicated support engineer

Managed inference option (we run the LLM)
On-prem or VPC deployment
Custom retention + DPA
SLA with credits

Talk to sales

Every plan includes

The retrieval stack, the explainability surface, and the integration tooling are the same on every tier. Paid tiers buy more capacity and a higher support level — not access to features.

Bring your own model — no inference markup, ever
Hybrid retrieval — BM25 + vector + knowledge graph, fused with RRF and reranked
Per-query explanations — every answer cites the memories it used
Cascade delete and write-time dedup (hash + 0.95 cosine paraphrase)
MCP endpoint with API-key and OAuth 2.1 auth (Claude Code, Cursor, Windsurf, ChatGPT)
Per-component model, prompt, and provider overrides
Multiple API keys per account
Usage analytics dashboard

Why bring your own model?

Inference is your LLM provider's price, not ours. Bundling it — the way most managed memory platforms do — just stacks a margin on top of what you already pay. We meter what we actually serve: the memory layer itself, with generous limits at every tier. Your inference contract stays with whoever you use today.

How we compare

Engram Team vs. Mem0 Pro at the tier most production teams land on. Both are managed; the meters and what's bundled differ.

Engram Team tier compared to Mem0 Pro tier
	Engram Team	Mem0 Pro
Monthly price	$99	$249
Stored memories	1,000,000	—
Add / extraction calls	Unmetered within storage cap	500,000
Retrievals / queries	5,000,000	50,000
Concurrent requests	32 per tenant	Not published
LLM inference	Bring your own (billed by your provider)	Bundled in the plan price
Knowledge graph	First-class, queryable, explainable	Available
Explanation trace per recall	Yes	Not surfaced
MCP server	Native in every tier (same endpoint, same auth)	Separate product (OpenMemory)

Source: mem0.ai/pricing · verified May 13, 2026. Competitor pricing changes; tell us if this is stale.

Pricing FAQs

Why is Engram bring-your-own-model?

Memory and inference are different jobs with different cost shapes. Inference scales with token volume; memory scales with stored items and retrievals.

Bundling them (the default at most managed memory platforms) means you pay a margin on tokens your LLM provider already charged you for, hidden inside a per-query rate that's hard to inspect.

Splitting them gives you a direct invoice from your model provider for inference, where your existing volume discounts and FinOps tooling already live, and a flat-shape invoice from us for the memory layer, where the numbers are predictable and easy to reason about.

How are retrievals counted, and why 5× memories?

One retrieval is one query: typically a single MCP query_memory call or POST /v1/query.

The 5:1 ratio between retrievals and stored memories isn't arbitrary. It's the median read-to-write multiplier we measured in our private beta. A bucket with 100K memories typically sees 500K–1M retrievals over a month, and 5:1 lets a Team plan comfortably cover the median case without forcing read-heavy customers to over-buy storage.

If your traffic genuinely runs 8–10× retrievals per memory (common for autonomous orchestrators), get in touch. The ratio is a default, not a hard ceiling.

What counts as a "stored memory" against my cap?

One atomic fact written into Engram, after dedup.

Two dedup paths run at write time: a hash-equality check on normalized content (so verbatim restatements never double-count) and a paraphrase check at 0.95 cosine for content that's different bytes but the same fact.

The result is that the counter grows with the breadth of what your agent has learned, not with how chatty it is. An agent that hears "I prefer dark mode" twenty times across a year of conversations adds one memory, not twenty.

What happens if I hit my memory or retrieval cap?

Writes and queries against an exhausted cap return HTTP 402 with a clear message: "Memory cap reached (used/cap). Upgrade your plan or delete memories to continue."

Your existing memories stay intact; only new writes or further queries are blocked until you upgrade or free space.

We picked the loud-failure mode on purpose, because a silent throttle leaves your agent timing out without a useful reason. Email warnings at 80%, soft-throttling, and rate-limit headers are on the roadmap; the current behavior is fail-loud-with-context.

Can you run the LLM for me, e.g., if I don't want to manage a key?

On Enterprise, yes. We can run extraction and synthesis with a managed model at cost plus a small margin, and the inference cost lands on the same invoice as the platform fee.

On Free, Indie, and Team, BYOM is required.

Keeping the public tiers BYOM-only is the discipline that lets us print transparent prices. Once a vendor is running inference for customers, the math stops working without a markup, and the markup gets hidden in a per-token rate that's hard to evaluate against published provider pricing.

How does pricing compare to running your own memory stack?

The honest answer follows the usual managed-vs-self-host curve: cheaper at small scale, comparable in the middle, and worse than self-hosting at very large scale. The break-even depends on your traffic shape and how much engineering capacity you have to throw at it.

What you're paying us for is hybrid retrieval (BM25 + vector + graph, fused and reranked) that already works, cascade-delete and write-time dedup that already work, a canonical profile pass that already works, and roughly two years of edge cases in the ingest and composer prompts that already work.

If your team is large enough that building all of that is cheaper than buying it, we'll point you at the public artifacts: the v44 composer prompt is MIT, the LongMemEval methodology is documented, and the architecture posts spell out the rest.

Which plan unlocks self-hosting or VPC deployment, and how is it priced?

Self-hosting and VPC live on Enterprise only — not Free / Indie / Team. Enterprise is priced annually with a custom commitment that covers the support engineer, the SLA, the DPA, and any managed-inference credits, separately from the deployment artifact itself.

The lower tiers stay hosted because the unit economics of supporting on-prem under a $99/mo SKU don't work for either side. If you have a procurement requirement that rules out hosted (regulated industry, data-residency, air-gap), talk to us — that is exactly the path Enterprise is shaped for.

For the product-side answer on what self-hosting includes (binary parity, audit log, credential storage), see the "Can I self-host Engram?" entry on the homepage FAQ.

Do I need a credit card to try Engram?

No. The Free tier requires only an account and gives you 10,000 stored memories and 50,000 retrievals per month, enough to run a real workload against the production retrieval stack for several weeks.

The architecture on Free is identical to what paying tiers get; the only difference is the monthly cap.