Tutorial
Add Engram memory to Windsurf in three minutes
One config-file edit, a restart, and a system prompt. After that, Windsurf has durable memory across sessions. Project conventions, decisions, and preferences that survive between days instead of getting re-pasted every morning.
Windsurf is an MCP client. That means adding memory to it is a config change, not an integration project. Add one server entry to mcp_config.json, restart the app, paste a short policy into your system prompt so the agent actually uses the new tools, and you're done.
This post walks the full thing end to end. There's a verification step at the bottom and a handful of gotchas we've seen come up enough times to be worth calling out. The "BYOK not configured" error is the one almost everyone hits first.
Prerequisites
- An Engram account. Free tier is enough to get this working: 10K stored memories and 50K retrievals per month, no card. Sign up at lumetra.io.
- An Engram API key. Grab it from the portal once you're signed in. You'll paste this into the Windsurf config in step 1.
- A BYOK model key configured. Engram is BYOK. Every extraction and query routes through your own provider. OpenAI, Anthropic, Groq, Together, and Fireworks all work. Add it at
/modelsin the portal before you callstore_memoryfor the first time, otherwise the MCP tool call comes back with a "BYOK required" error message and you'll spend ten minutes wondering what you missed. - Windsurf, recent build. MCP over SSE has been stable since the late-2025 releases.
Step 1: Edit mcp_config.json
This is the only step that actually matters. Everything else is plumbing around it. Open ~/.codeium/windsurf/mcp_config.json; if the file doesn't exist yet, create it. Add the Engram server under mcpServers:
{
"mcpServers": {
"engram": {
"serverUrl": "https://mcp.lumetra.io/mcp/sse",
"headers": {
"Authorization": "Bearer <api-key>"
}
}
}
}
Replace <api-key> with the key from your portal. The URL is the hosted MCP endpoint; there's nothing to self-host. SSE is the transport Windsurf expects for remote MCP servers, which is why we use the /mcp/sse path rather than a plain HTTP one.
If you already have other MCP servers in this file, just add "engram" as another key under mcpServers. Don't replace the whole object.
Step 2: Restart Windsurf
Fully quit Windsurf and reopen it. A window reload is not enough. Windsurf reads mcp_config.json on process start, and a soft reload will silently keep using the old server list. We've seen this trip people up enough to call it out twice.
Once it's back up, open Settings → MCP Servers. On a recent build the tools usually show up within a second or two of the panel opening, but the first time we did this we sat there for maybe fifteen seconds waiting for the list to populate before realizing the panel itself just needed a moment to spin. You should see engram listed with a green status indicator. If it's red, the most likely cause is a malformed JSON file. Check for a trailing comma or missing brace. The second most likely cause is a bad API key; double-check you pasted the whole thing.
Step 3: Test it
Open a chat in Windsurf and ask the agent to store a fact, then query for it. Something like:
- "Use store_memory to remember that I prefer named exports, tabs over spaces, and no semicolons in this project."
- "Use query_memory to check what code style I prefer."
The first call returns a confirmation with a memory ID. The second returns a synthesized answer, a count of memories that backed it, and the graph facts that came out of triple extraction at ingest time. A successful query_memory response looks roughly like:
{
"success": true,
"answer": "You prefer named exports, tabs over spaces, and no semicolons in this project.",
"memories_found": 1,
"graph_facts": [
{ "subject": "user", "predicate": "prefers", "object": "named exports" },
{ "subject": "user", "predicate": "prefers", "object": "tabs over spaces" },
{ "subject": "user", "predicate": "prefers", "object": "no semicolons" }
],
"usage": { "input_tokens": 312, "output_tokens": 24 }
}
If you get that shape back, the integration is live. If the call fails with a "BYOK required" error message (the structured error carries code: "byok_not_configured"), jump to the gotchas below.
The system prompt that makes it actually get used
This is the part most people skip and then wonder why their memory server feels useless. An MCP tool being available is not the same as the agent calling it. Without explicit instruction, Windsurf will mostly ignore query_memory and answer from whatever's in its immediate context.
Paste this into Windsurf's system prompt. Settings → Cascade → System Prompt, or your project's .windsurfrules file:
You have Engram memory. Use it proactively to improve continuity and personalization.
Tools:
- store_memory(content, bucket?) - Store a fact or piece of information
- query_memory(question, bucket?) - Search memories using natural language
- list_buckets() - List available memory buckets
- delete_memory(memory_id, bucket) - Delete a specific memory
- clear_memories(bucket) - Clear all memories in a bucket (destructive!)
Policy:
- Query-first: before answering anything that may rely on prior context, call query_memory. Ground your answers in the results.
- Proactive storing: capture stable preferences, profile facts, project details, decisions, and outcomes. Keep each fact concise (1-2 sentences).
- Use buckets: organize memories by project or context (e.g., "work", "personal", "project-alpha").
Style for stored content: short, declarative, atomic facts.
Examples:
- "User prefers dark mode."
- "User timezone is US/Eastern."
- "Project Alpha deadline is 2026-10-15." The two lines that matter most are query-first and proactive storing. Query-first says: before you answer anything that might depend on prior context, check memory. Proactive storing says: when the user states something durable, write it down without being asked.
You can tune the bucket naming convention and the "what counts as durable" examples to your workflow. Don't tune the query-first rule. It's load-bearing.
What memory unlocks for IDE work
The cheap answer is "the agent remembers things." The honest answer is more specific.
Project conventions stop being a paste-in. Tabs versus spaces, named versus default exports, the testing framework, the import-ordering rule, the fact that this repo uses pnpm and not npm. You tell Windsurf once, it lands in memory, and three weeks later when you're back in the same project from a different machine the agent already knows. The first-day-of-the-project ritual of pasting your style guide into the system prompt becomes a one-time store_memory call.
Decisions survive between days. "We decided to use Tanstack Query instead of SWR for the auth flow because of the suspense integration" is the kind of thing you tell the agent on a Tuesday and then can't find on Thursday. With memory on, the agent queries before suggesting libraries and surfaces the decision instead of suggesting SWR again.
Debugging context survives between sessions. Hit a weird race condition in the WebSocket handler on Monday, walked away to ship something else, came back Friday. Instead of explaining the whole investigation from scratch, the agent recalls the symptoms and the candidate hypotheses you'd ruled out. This is the use case that has the highest "I would pay for this" rate among the developers we've talked to.
Preferences that stick across IDEs. Memories aren't scoped to Windsurf. If you also use Claude Code or Cursor with the same Engram bucket, the same preferences and decisions show up there. The IDE changes; the memory doesn't.
Common gotchas
"BYOK required" on the first store_memory call
This is the one almost everyone hits, so it gets the most ink. If your very first store_memory call comes back with an error message that mentions "BYOK required: configure your LLM provider at /models," your BYOK model key isn't configured server-side. Engram doesn't ship with a default model. Every extraction call routes through your provider, and we need a key to route to. Go to /models in the portal, paste an OpenAI / Anthropic / Groq / Together / Fireworks key, and retry. The error is intentional and deliberately surfaces at setup time rather than at a load spike six weeks later, when the failure mode would be much harder to read. The structured error carries code: "byok_not_configured" if your MCP client surfaces error codes; on the REST API the same condition returns HTTP 412.
Old config is still active after editing the file
Windsurf caches the MCP server list at process start, so a reload or a window close-and-reopen isn't the same as a full quit. On macOS use Cmd-Q, not the red close button. On Linux, exit from the system tray.
The agent stores nothing on its own
If query_memory returns empty for things you "definitely told the agent," it usually means the agent never called store_memory in the first place. The system prompt above is the fix.
Per-request config override
If you want a single Windsurf chat to use a different BYOK provider (a Groq key for that session instead of the default OpenAI one in the portal, say) Engram supports a per-request header: X-Engram-Config-Set: <config-set-id>. Add it to the headers block alongside Authorization. Most users never need this.
Buckets are global by default
If you don't pass a bucket argument to store_memory or query_memory, everything goes into the default bucket. Fine for personal use. If you're using Engram across distinct projects, tell the agent to namespace by project in the system prompt (bucket: "project-alpha", bucket: "personal") and they'll stay cleanly separated.
Other MCP clients
The whole point of MCP is that the wire format is the same across clients. Once you've done this for Windsurf, doing it for Cursor or Claude Code is a copy-paste with a different file path. Claude Code uses claude mcp add-json engram ...; Cursor reads ~/.cursor/mcp.json; OpenClaw follows the same mcpServers shape. The docs page has the exact snippet for each.
Further reading
Closely related
- Add Engram memory to ChatGPT as a custom connector. Custom MCP connector, OAuth handshake, and the system prompt that turns the tools into something the model actually uses.
- Engram + Vercel AI SDK: memory-aware chat in a Next.js app. Wire Engram into the AI SDK with three tools, one route handler, and one system prompt. Working integration guide.
- What is AI agent memory?. Vendor-neutral primer on the category: statelessness, hybrid retrieval, why a vector DB alone is not enough.
Engram
- Engram on LongMemEval-S: 91.6%. Full benchmark methodology and what didn't work.
- Engram docs. HTTP API, MCP setup for each client, SDK examples.
- Start with Engram. Free tier, BYOK, MCP-native.