Pricing

Pricing for memory: why we rejected MAU, per-project meters, and a Scale tier

Pricing decisions are the most consequential and least-documented part of building a product. This is the unsanitized version: every pricing model we considered for Engram, why each looked attractive at first, and the specific failure mode that made us drop it. We landed on a two-meter model (memories stored plus retrievals per month, at a 5:1 ratio) and four plans. The rest of this post is about everything we threw away.

Published March 10, 2026 · By Jacob Davis and Ben Meyerson

Pricing has to match the cost shape

Before any model survives contact with a customer, it has to survive contact with your own infrastructure bill. The right pricing model is the one whose meter moves the same way your costs move. If it moves more slowly than the cost, you lose money on heavy users. If it moves faster, you overcharge light users and they churn. Eventually you have a customer base that looks nothing like the one you priced for.

A memory product has two distinct cost axes, and they don't track each other.

Stored memories are persistent infrastructure. Every memory in Engram lives in three places at once: a Postgres row with the raw text, a pgvector embedding for semantic retrieval, and a set of subject-predicate-object triples in the knowledge graph. None of that goes away between sessions. It sits on disk, gets backed up, gets indexed, and gets slower to query past certain bucket sizes. Storage cost scales with the count of memories, and so does the operational cost of keeping the indexes healthy.

Retrievals are per-event compute. Every /v1/query call runs BM25 over the text table, ANN search over the vector index, graph traversal for relational hits, reciprocal-rank fusion across all three, and reranking on the top candidates. The cost of one retrieval is roughly fixed; the cost of a million retrievals is a million times that. It tracks usage, not storage.

The two axes are not correlated. A team that loads 500K memories on day one and runs 20 retrievals a month looks identical, by storage, to a team that does the same and runs 500K retrievals a month. To us they look completely different. The first customer's costs are dominated by index maintenance and the embedding work we did at ingest. The second's are dominated by inference and read throughput, every single day.

Any pricing model that only measures one of those axes will undercharge one of those customers and overcharge the other. The undercharged one will dominate your usage; the overcharged one will leave. That's the framing we used to evaluate every option below.

The corollary, which took us longer to internalize than it should have, is that a meter that doesn't correspond to a real thing you pay for is segmentation in disguise. It looks like a pricing decision. It's actually a tier-construction decision dressed up as one, and customers can usually tell. They argue about MAU definitions, about which project counts as a project, about whether a token spent on extraction should cost the same as a token spent on a composer call. Those arguments aren't really about the meter. They're about the pricing model not matching what the customer experiences as the cost of the product, and they go away the moment the meter and the cost shape line up.

The one we got wrong for the longest: a $499 Scale tier

It took us three months to drop the Scale tier, and we kept trying to justify it because it looked like revenue. For most of that time it sat in our internal pricing doc at $499/month, between the $99 Team plan and Enterprise, on the theory that there's a real customer who outgrows Team but isn't ready for an Enterprise contract. Higher caps, maybe priority support, maybe a slightly customized SLA, no sales call required.

On the spreadsheet, Scale looked like free money. Every customer who'd stay on Team and complain, or escalate to Enterprise and drag out the cycle, could just pick Scale and self-serve. It survived four revisions of the doc because revenue projections always looked better with it included.

What we eventually had to admit was that we were inventing a customer to justify the tier. Every actual conversation with a customer outgrowing Team had ended one of two ways. Either they asked for an Enterprise conversation, in which case Scale was a placeholder we were going to bypass anyway, or they were self-serve customers for whom $99 was already a meaningful spend and $499 would be a real budget conversation. Nobody we talked to wanted a fifth-of-an-Enterprise plan. They wanted either bigger limits on Team or an actual relationship.

What Scale was actually doing was segmenting customers we wanted to consolidate. A Team customer hitting their limits should be able to bump caps without paying 5× more. An Enterprise prospect should be able to start that conversation without first being told there's a $499 plan they "should consider." Scale put friction in both directions.

We dropped it in March. The pricing page got shorter, the conversion path got clearer, and the average deal size for customers who would have been Scale candidates went up, not down, because they either expanded on Team or talked to us about Enterprise. The tier wasn't capturing demand; it was creating a stall point. The pattern we keep coming back to is that a tier built for revenue-modeling reasons rather than cost-shape reasons is segmentation theater. It looks like price discrimination on the spreadsheet. To customers it looks like an extra decision they didn't want to make.

What's stuck with us from the experience isn't a rule about pricing so much as a tell about ourselves. Whenever a tier survives multiple rounds of debate on the strength of "but the revenue model needs it" rather than "but we know the customer who buys it," that's the same shape. We caught it once. The next time it shows up it'll be wearing different clothes.

What we rejected: MAU

The first model we sketched was per-monthly-active-user. It's the default for B2B SaaS, the pricing page basically writes itself, and every buyer would already understand it. We spent about two weeks treating this as the front-runner.

The failure mode is the cost shape. A memory product's expense doesn't track users at all. It tracks how many memories live in our database and how often someone queries them. A five-person team running an automated ingest pipeline that loads a million memories from their internal docs costs us substantially more than a fifty-person team where each person stores fifty memories by hand.

Under MAU you have two options, and both are bad. You can price for the heavy user, in which case the light user pays for capacity they'll never touch and churns within the first billing cycle. Or you can price for the light user and eat the difference on the heavies, in which case the heavies are exactly the customers you can't afford. The heaviest 5% of usage in our private beta accounted for over 70% of our infrastructure spend. A flat per-seat fee was not going to survive that distribution.

The other thing that killed MAU, more quietly: agent memory has a weird "user" definition. Is the user the developer who set up the bucket? The end-customer whose conversation got stored? The background agent doing automated ingest on a cron? The teams using Engram for support-ticket memory don't have human users in the conversational sense at all. We'd have spent half our sales cycle explaining what "active user" meant.

What we rejected: per-project meter

The next model we tried was per-project, or equivalently per-bucket. Customers already think in projects: this bucket is for the support agent, that one is for the internal research tool, this one is for a single end-customer's session memory. A tier that read "up to 10 projects, $99/month" felt natural.

The failure mode is cardinality. Most customers have two to five projects. A few have one. A handful have twelve. Nobody has eighty. The integer range is too narrow to price intelligently across: if your tiers are 1, 5, 10, 20, and "unlimited," you're forcing customers to choose between brackets that span almost nothing in real terms, and the unlimited tier is the only one anyone with growth ambition would buy.

You also haven't priced the actual cost. A project with 500 memories costs us a rounding error. A project with 5 million memories costs us real money. Charging per project says nothing about which kind a customer has. We'd be in the same place we were with MAU: priced against the wrong axis, with the cost shape underneath quietly diverging from the bill. And buckets are cheap to create on purpose. We wanted customers to feel free to make a new one for every project, customer, or experiment. Charging per project would create a perverse incentive to cram everything into one bucket and lose all the scoping benefits.

What we rejected: per-token (LLM pass-through)

This one was tempting because it tracked compute directly. The memory pipeline calls an LLM at several points: extraction during ingest, profile generation on first query against a fresh bucket, optional composer prompts at retrieval time. Each of those burns tokens. A token-based meter would have mapped almost perfectly onto our inference bill.

Two things killed it. The first is BYOK. We let customers bring their own model API key, and most of our serious users want to, both for cost control and for data-handling reasons. Under BYOK, the customer is already paying their model provider directly. The tokens don't pass through our billing system. We could meter them, but the customer would be paying twice for the same thing, and they would correctly object.

The second problem is who a token-based bill rewards. Our most engaged customers, the ones running the most retrievals and ingesting the most memories, are the ones who'd have the largest token bills. We'd be punishing engagement. Memory products live or die on becoming a habit; if every new bucket someone creates makes their bill noticeably bigger, we're working against ourselves.

There's a version of per-token that survives as a managed-inference add-on for customers who don't want to bring their own keys, billed at a markup over what we pay. That's a real product. It isn't the pricing model. We didn't want our pricing surface to depend on whether the customer was using their own keys or ours.

What we rejected: a 1:1 ratio

For a brief period we had a single-meter model: one "memory unit" consumed by either a store or a retrieval. Simpler math, one number on the page. Then we pulled retrieval logs from the private beta and saw the median bucket runs five to ten retrievals per stored memory. A 1:1 cap would have most customers hitting the retrieval ceiling at one-fifth to one-tenth of the storage they were paying for. That's artificial scarcity that they'd correctly read as us throttling them, and the fix (call support, get the cap raised) is the friction that kills self-serve. We could have hidden the ratio inside a single combined unit and tuned the math behind the scenes, but that breaks the rule that customers should be able to model their bill from the page. The whole point of a public meter is that the buyer can take their workload and predict the cost. We chose to expose the ratio explicitly.

What we rejected: "unlimited retrievals" on free

Every free-tier discussion eventually arrives at "what if retrievals were just unlimited," because retrievals are cheap per call and the free tier needs to feel generous. The problem is that unlimited isn't unlimited. A single misconfigured agent in a tight loop can make several thousand retrievals an hour, and we've watched it happen in our own logs. With no cap there's no signal on either side, the customer doesn't know their agent is being noisy, and we eat the cost of an inference workload that wasn't doing anything useful. A meter, even a generous one, is a feedback loop. "Unlimited" sounds friendly; the failure mode is hostile to everyone involved.

What we landed on

Two meters. Four plans. One pricing page where you can predict your bill from the workload.

Memories stored (max_memories). The lifetime cap on a tenant. Tracks the storage axis directly. Deleting memories returns capacity. The number lives in our plan_limits table next to the rest of the tenant config.

Retrievals per month (monthly_retrievals). Resets on a monthly cycle. Tracks the per-event compute axis. Soft-cap behavior at the limit: queries slow down before they stop, and customers get an email at 80% and 100% so the ceiling is never a surprise.

5:1 ratio between retrievals and memories at each tier. A plan with 100K memories includes 500K retrievals. That matches the median workload we measured in beta. Customers who are unusually read-heavy can buy add-on retrievals without bumping their storage cap; customers who are unusually write-heavy can do the reverse. Most customers land in the middle and never need either.

Four plans: Free, Indie ($29), Team ($99), Enterprise. Each plan publishes its specific caps. Free is real: 10K memories, 50K retrievals, no credit card. Indie is for individual builders. Team is the SMB plan. Enterprise is contract. The full numbers and per-tier limits live on the pricing page; they're loaded directly from the same plan_limits table the production server reads at request time, so the page can't drift from what we actually enforce.

BYOK on every tier. Bring your own model API key. The inference cost goes from you to your model provider; it doesn't pass through us. We charge for memory infrastructure, not for tokens we didn't generate. This was contentious internally for a while, because the obvious argument for keeping inference bundled is that it simplifies the buyer's bill, which is how most agent products do it. We went the other way because our customers are mostly developers who already have model provider accounts, and they didn't want to pay a margin to a third party for tokens they could buy directly. We'd make the same call again.

The tier caps came from actual usage distributions in beta, not from round numbers. The Team plan's 500K-retrieval cap is what real teams used in a steady-state month, not a number that ended in zeros for layout reasons. The Indie plan's 10K-memory cap is where individual builders sat at the 75th percentile after sixty days of use.

Why four tiers and not three or five. Three is the minimum you can express the basic shape with (free, paid self-serve, contract). Five is where customers start feeling upsold rather than offered choices, which is the lesson Scale taught us. Four lets the self-serve range split into an individual-developer plan and a team plan, which are the two real personas we see in the funnel. Anything more is a hypothesis about a customer we haven't met yet.

The plan limits live in plan_limits, a single Postgres table the server reads at request time and the pricing page renders from. Bumping a cap is a SQL update. We don't have separate "pricing-page caps" and "enforced caps" the way a lot of products end up with after a year, because we wanted the page to be the source of truth, and the only way that stays true is if there's exactly one place the numbers live. The trade-off is that we can't run a/b tests on caps the way some pricing tools encourage, but we've found that the trust we get from "the page says X and the server enforces X" is worth a lot more to our buyer than the experiment we'd run if we could.

What we're watching

The 5:1 ratio is the assumption we're least confident about long-term. It was right for the workloads we measured in beta, and it's been right for the first few months in production. But agent workloads are evolving fast. If automated background agents become more common, and they will, they tend to be more retrieval-heavy than human-driven agents, because they re-query memory between every step. If the average ratio creeps to 8:1 or 10:1 across our customer base, we'll re-baseline rather than letting customers run into ceilings that no longer match how agents actually work.

We're also watching whether the free-tier limits are correctly calibrated. Right now they're set so that a serious individual developer can use the product for real work without hitting either cap. If that turns out to be too generous (free users churning into Indie at lower rates than projected) or too tight (free users hitting ceilings before they've meaningfully tried the product), we'll adjust. The numbers in plan_limits are configuration, not architecture; we can move them.

And we're watching ourselves for Scale-tier energy on a new axis. The temptation to invent a customer to justify a tier doesn't go away once you've caught yourself doing it once. The next time it shows up, it'll be on some other dimension (a tier between Indie and Team, an add-on we can't quite articulate, a per-region surcharge), and the move is the same: go look at the conversations with customers who'd theoretically buy it, and find out they don't exist.

What we'd tell ourselves three months ago

We didn't get to the current pricing page on the first pass, or the second. It's the version that survived the most internal argument, and most of the arguments it survived were against ideas that looked, on paper, like revenue. Scale was the loudest of those. It took three months to drop, and the only reason we eventually did is that we went and looked at the actual customer conversations on either side of it and found that the Scale customer didn't exist. Every customer who'd theoretically have wanted Scale had, in practice, either expanded on Team because the next jump made them re-evaluate the whole product, or asked for Enterprise because they wanted a person, not a higher number.

The pricing page is shorter now, and we still occasionally catch ourselves drafting the next Scale tier under a different name. That's the part of pricing nobody tells you about. The mistake you already made is also the mistake you keep almost making again, and the only thing that keeps you from making it twice is being willing to read your own pricing doc the way a customer would, and notice the tier nobody asked for.

Further reading

Closely related

Engram