Integrate Engram into your applications.
REST endpoints and the MCP server. Auth, memory storage, retrieval with explanations, and bucket management.
Getting Started
Engram provides a REST API for storing and querying memories. All API requests require authentication using an API key.
Official client libraries
Zero-dependency clients with full type hints for Python, TypeScript, and Go:
- Python:
pip install lumetra-engram· pypi.org/project/lumetra-engram · github.com/lumetra-io/engram-py - TypeScript / JavaScript:
npm install @lumetra/engram· npmjs.com/package/@lumetra/engram · github.com/lumetra-io/engram-js - Go:
go get github.com/lumetra-io/engram-go· pkg.go.dev/github.com/lumetra-io/engram-go · github.com/lumetra-io/engram-go
All three libraries wrap the same REST API documented below. Use them if you'd rather not hand-roll HTTP calls.
Codex plugin
Add this entry to .agents/plugins/marketplace.json in your repo, then install via the Codex UI. Codex will prompt for your ENGRAM_API_KEY on install and store it in its secret store.
{
"name": "engram-plugins",
"interface": { "displayName": "Engram Plugins" },
"plugins": [
{
"name": "engram",
"source": { "source": "local", "path": "./plugins/engram" },
"policy": { "installation": "AVAILABLE", "authentication": "ON_INSTALL" },
"category": "Productivity"
}
]
}The plugin folder itself lives at github.com/lumetra-io/engram-codex-plugin. Copy plugins/engram/ from there into your repo's plugins/ directory (the marketplace entry's source.path points at the local copy). Once installed, your agent gets store_memory, query_memory,list_memories, and the rest of the Engram tool surface.
Base URL
https://api.lumetra.ioAuthentication
Include your API key in the Authorization header:
curl -X POST https://api.lumetra.io/v1/buckets/default/memories \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"content": "Alice works at TechCorp"}'Managing API Keys: Create and revoke API keys from your Lumetra dashboard after signing in. API keys cannot be managed via the API for security reasons.
Quick Example
Store a memory and query it:
# Store a memory
curl -X POST https://api.lumetra.io/v1/buckets/work/memories \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"content": "Bob is the CEO of Acme Inc"}'
# Query your memories
curl -X POST https://api.lumetra.io/v1/query \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "Who is the CEO of Acme?", "buckets": ["work"]}'MCP Server
Engram provides an MCP (Model Context Protocol) server for direct integration with Claude and other AI assistants. The MCP server exposes memory tools that Claude can use to store and retrieve information.
Claude Code (CLI)
Add the MCP server with a single command:
claude mcp add-json engram '{"type":"sse","url":"https://mcp.lumetra.io/mcp/sse","headers":{"Authorization":"Bearer YOUR_API_KEY"}}'Claude Desktop / Windsurf / Cursor
Add to your MCP configuration file:
{
"mcpServers": {
"engram": {
"url": "https://mcp.lumetra.io/mcp/sse",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
}
}Available MCP Tools
| Tool | Description |
|---|---|
| store_memory(content, bucket?) | Store a fact or piece of information |
| query_memory(question, bucket?) | Search memories using natural language with AI synthesis |
| list_memories(bucket, limit?) | List all memories stored in a bucket |
| list_buckets() | List available memory buckets |
| delete_memory(memory_id, bucket) | Delete a specific memory by ID |
| clear_memories(bucket) | Clear all memories in a bucket |
MCP Tool Responses
Heads up — MCP and REST use different field names for the same data. The MCP wrapper predates the REST API and we haven't unified the field names (renaming MCP fields would break every agent configuration in the wild). If you call both surfaces from the same client, normalize on the wire.
| What it is | REST field | MCP field |
|---|---|---|
| Memory primary key (top-level) | id | memory_id |
| Bucket identifier on responses | bucket_name | bucket |
| Count of cleared memories | cleared_count | memories_deleted |
Inside the retrieval payload (REST: memories[], MCP: retrieved_memories[]) andgraph_facts[], both surfaces use the same field name (memory_id) — the array names are the only difference, and SDKs normalize to the REST shape.
Only direct MCP tool calls (e.g. from Claude.ai connectors) see the MCP-flavored field names.
store_memory (MCP) returns:
{
"success": true,
"memory_id": "uuid",
"bucket": "default",
"token_count": 8,
"config_id": "default",
"extractor_usage": {
"input_tokens": 32,
"output_tokens": 18
}
}
// extractor_usage is null when triple extraction didn't run for this
// memory (BYOK extractor not configured, call failed, or no extractable
// facts). Memory is still stored and queryable in either case.clear_memories (MCP) returns:
{
"success": true,
"memories_deleted": 42,
"bucket": "default"
}
// REST equivalent: DELETE /v1/buckets/{bucket}/memories returns the
// same count under "cleared_count" (not "memories_deleted").query_memory returns:
{
"success": true,
"answer": "You prefer Python over JavaScript.",
"memories_found": 1,
"retrieved_memories": [
{
"memory_id": "mem_8a3f...",
"bucket_id": "buc_d12c...",
"bucket_name": "default",
"content": "User prefers Python over JavaScript for new services.",
"raw_score": 0.94,
"weight": 1.0,
"weighted_score": 0.94
}
],
"graph_facts": [
{
"subject": "User",
"predicate": "prefers",
"object": "Python",
"memory_id": "mem_8a3f...",
"bucket_id": "buc_d12c...",
"bucket_name": "default",
"depth": 0,
"weight": 1.0,
"timestamp": "2026-05-20T14:32:00Z"
}
],
"entity_matches": [
{ "entity": "Python", "bucket_name": "default", "score": 1.0 }
],
"context_tokens": 42,
"usage": {
"input_tokens": 6,
"output_tokens": 8
}
}
// Each graph_facts[i].memory_id matches a retrieved_memories[j].memory_id —
// use that link to cite the source memory behind any fact.Example Usage
Once connected, Claude can use memory tools automatically:
User: "Remember that I prefer Python over JavaScript"
Claude: [calls store_memory]
content: "User prefers Python over JavaScript"
bucket: "default"
Result: Memory stored successfully
---
User: "What are my programming preferences?"
Claude: [calls query_memory]
question: "What are the user's programming preferences?"
Result: Based on stored memories, you prefer Python over JavaScript.Memories
Store, list, and manage memories. All endpoints require API key authentication.
/v1/buckets/{bucket}/memoriesAuth RequiredList memories in a bucket with pagination. Returns newest first. 404s if the bucket does not exist (use POST /v1/buckets or store a memory first to auto-create).
/v1/buckets/{bucket}/memoriesAuth RequiredStore a new memory in a bucket
/v1/buckets/{bucket}/memories/{memory_id}Auth RequiredDelete a specific memory by ID
/v1/buckets/{bucket}/memoriesAuth RequiredClear all memories in a bucket
/v1/buckets/{bucket}Auth RequiredDelete a bucket entirely (and every memory in it). Destructive — no undo.
/v1/queryAuth RequiredQuery memories using natural language with AI-powered synthesis. Pass "stream": true to receive the answer as Server-Sent Events as the synthesizer produces it — see the Streaming Queries section below. Bucket scoping: omitting `buckets` defaults to `["default"]` (auto-created on next write); passing `[]` returns 400; passing names that don't exist returns 404 with the missing bucket(s) listed.
Buckets
Organize memories into separate namespaces called buckets.
/v1/bucketsAuth RequiredList all buckets for your account
/v1/bucketsAuth RequiredCreate a new bucket. `name` is required; `description` is optional. Idempotent: posting the same name twice returns 200 with the existing bucket (and preserves the original description) instead of 201 / 409. Names starting with `_` are reserved for system meta-buckets and return 403.
/v1/buckets/{bucket}/profileAuth RequiredInspect the canonical profile for a bucket. The profile is generated by the Bucket Profiler agent and prepended to every query against this bucket to improve recall. Status is `not_installed` until you call `/profile/ensure-agent`, then `pending` while the first tick is in flight, then `ready` once a profile has been written.
/v1/buckets/{bucket}/profile/ensure-agentAuth RequiredIdempotent install of the Bucket Profiler agent + attach it to this bucket. Pass `?run_now=true` to fire an immediate tick. Use this before /profile/regenerate; without it the regenerate endpoint 412s.
/v1/buckets/{bucket}/profile/regenerateAuth RequiredForce a fresh profile generation by triggering the installed Bucket Profiler agent. Returns 412 with `BUCKET_PROFILER_NOT_INSTALLED` if no agent is attached to this bucket — call `/profile/ensure-agent` first. The tick runs in the background; poll GET /profile to see status flip from `pending` to `ready`.
Usage & Stats
Monitor your usage and resource statistics.
/v1/usageAuth RequiredGet token usage statistics
/v1/statsAuth RequiredGet resource statistics
Agents
Install and manage Memory Agents from the template library. Agents run on a schedule and operate on your memory. Read-only operators (Bucket Profiler, Logger) write findings into their own meta-bucket — useful telemetry, but nothing in your real buckets changes. Mutation operators (Janitor, Consolidator, Watchdog) propose or apply changes to actual memories; every one of those mutations is audit-logged and reversible from /v1/audit.
/v1/templatesAuth RequiredList available agent templates (Profiler, Watchdog, Logger, Janitor, Consolidator).
/v1/templates/{template_id}Auth RequiredGet full spec for a single template (operators, params, permissions, retention). Returns 404 if the template_id is unknown.
/v1/agentsAuth RequiredList installed agents and their attachment / tick state.
/v1/agentsAuth RequiredInstall an agent from a template. Both `template_id` and `name` are required.
/v1/agents/{agent_id}Auth RequiredGet a single agent with full spec + attachments.
/v1/agents/{agent_id}Auth RequiredToggle enabled, upgrade to latest template version, or replace the spec.
/v1/agents/{agent_id}Auth RequiredUninstall the agent. Default is `?findings=archive`, which renames the meta-bucket so historical findings stay queryable. Pass `?findings=delete` to drop the meta-bucket and every finding in it — destructive and irreversible.
/v1/agents/{agent_id}/tickAuth RequiredManually run a tick for one of the agent's attached buckets. Returns the tick row + persisted findings count.
/v1/agents/{agent_id}/ticksAuth RequiredRecent ticks (newest first). `?limit=20` default.
/v1/agents/{agent_id}/attachmentsAuth RequiredList buckets this agent runs against, with cron interval + last-tick state.
/v1/agents/{agent_id}/attachmentsAuth RequiredAttach the agent to a bucket on a schedule. The bucket can be passed as a UUID or a name (aliases: `bucket_id`, `bucket_name`, `bucket`). `interval` accepts a cron expression or `@every:Ns/m/h/d`; pass `"manual"` to disable auto-ticks.
/v1/agents/{agent_id}/attachments/{attachment_id}Auth RequiredPause/resume or change the interval for a specific attachment.
/v1/agents/{agent_id}/attachments/{attachment_id}Auth RequiredDetach the agent from a bucket. Findings already written remain in the meta-bucket.
Audit Log and Rollback
Every mutation (memory create/delete/update/expire/merge/reassign, bucket delete) by both users and agents is recorded in the audit log. Roll back any single event or cascade through a chain of changes. Reads on /v1/memories/{id}/at let you query the corpus at any point in the past.
/v1/auditAuth RequiredRecent audit events, newest first. Filterable by actor type, memory id, bucket id, tick id, and time range — same filters the portal /history view uses, so a custom audit UI can rebuild that view in full.
/v1/audit/{event_id}Auth RequiredSingle event detail plus a conflict preview (newer events touching the same target).
/v1/audit/{event_id}/rollbackAuth RequiredApply the inverse of an audit event. Refuses with 409 + conflicts if a newer event touches the same target. Pass `force=true` to override (lossy) or `cascade=true` to roll back the newer events first.
/v1/audit/groups/{parent_event_id}/rollbackAuth RequiredRoll back an event AND all of its children (the per-source events for memory_merge, etc.) in one transactional sweep.
Point-in-Time and History
Versioned columns under the hood let you query the state of any memory at a past timestamp, and read a full per-bucket changelog feed.
/v1/memories/{memory_id}/atAuth RequiredResolve a memory's state at the timestamp passed in `?as_of=` (ISO 8601, required). Returns the archived version if one covers that time, otherwise the current state if the timestamp is post-creation, otherwise 404.
/v1/buckets/{bucket}/historyAuth RequiredAudit events touching a bucket, newest first. Useful for "what happened in this bucket" support views. The path accepts a UUID or a bucket name, like the rest of the /v1/buckets/* routes.
Dedup
Naming note: "strict" isstricter about what counts as a duplicate, not "more aggressive deduplication." It raises the similarity bar to 0.99 so only near-identical writes collapse — i.e. it preserves more memories than the default."loose" dedupes more aggressively at 0.95.
The store path runs a similarity check before writing. By default ("loose", similarity ≥ 0.95) it collapses near-duplicate writes into the existing memory so re-ingesting the same source doesn't bloat the bucket. For narrative content this is usually what you want.
For templated time-series content (financial filings, daily metrics, log rows) where rows are structurally similar but each carries unique values, the default collapses real data. Pass "dedup": "off" on the store request to disable.
Detecting silent merges
Every store response includes a status field —"stored" for a fresh write,"merged" when the server absorbed the write into an existing memory. The merged response includes three extra fields:
{
"id": "ec59...", // the canonical id (= deduped_into)
"bucket_name": "prices_AAPL",
"token_count": 222,
"status": "merged",
"deduped_into": "ec59...",
"similarity_score": 0.987,
"merge_reason": "embedding_similarity"
}merge_reason is one of:content_hash (byte-identical content already stored), embedding_similarity (vector match ≥ threshold), conflict_keep_existing (LLM conflict resolver chose the existing memory), orconcurrent_insert_race (another worker stored identical content first).
Policies
| dedup | Behavior | Use when |
|---|---|---|
"off" | Skip both the byte-hash and similarity checks. Every write stores a new memory. | Bulk ingest of templated time-series rows. Re-importing the same source on purpose. |
"loose" (default) | Collapse writes at similarity ≥ 0.95. | Narrative content where structurally similar chunks really are redundant. |
"strict" | Only collapse near-identical writes (≥ 0.99). | Keep a safety net against exact re-ingest while allowing near-identical but distinct content (e.g. month-over-month metrics). |
Example
curl -X POST https://api.lumetra.io/v1/buckets/prices_AAPL/memories \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"content": "AAPL 2024-01 OHLC: 187.15/195.62/178.85/188.47", "dedup": "off"}'Billing note
When dedup fires we skip the triple-extractor pass, so a merged write doesn't cost extraction tokens. Setting"dedup": "off" means more memories store → more extractor runs → more BYOK usage. Use theextractor_usage field in each response to audit.
Query Knobs
POST /v1/query accepts these tuning options in the options object (all optional). They compose with each other and with stream.
top_k_per_bucket
Per-bucket retrieval depth. Either an integer (same K for every bucket) or an object mapping bucket names to K. Missing buckets fall back to the global top_k. Built for asymmetric bucket sizes — e.g. a 1900-memory filings bucket paired with a 6-memory monthly-prices bucket, where a uniform K either starves the big one or wastes slots on the small one.
"options": {
"top_k": 8,
"top_k_per_bucket": {
"edgar_AAPL": 20,
"prices_AAPL": 4,
"patents_AAPL": 5
}
}max_tokens
Cap synthesis output. The default (8192) is generous so Qwen3 / GPT-5 reasoning traces don't truncate. Lower it when running agent loops, embedding answers in pipelines with strict context budgets, or controlling cost.
"options": { "max_tokens": 400 }min_weighted_score (recommended) / min_similarity_threshold
Two precision floors with different scales. Most callers want min_weighted_scorebecause it filters on the same number you see inmemories[*].weighted_score.
| Knob | Operates on | Typical values | When to use |
|---|---|---|---|
min_weighted_score | Post-RRF weighted score (visible in response) | 0.0 = positive matches only; higher = stricter | Default choice. Citations-grade retrieval, "least-bad" guard. |
min_similarity_threshold | Raw cosine similarity (embedding score) | 0.1–0.5 typically | You specifically want a floor on the underlying embedding signal, not the RRF blend. Rarely needed. |
"options": {
"min_weighted_score": 0.0 // keep only positive-score chunks
}return_format + response_schema
"return_format": "json" asks the synthesizer for a JSON response. The provider's JSON-object mode is used when available; for providers without native support the prompt instructs strict JSON. Optionally pass"response_schema" (a JSON Schema dict) to guide the model toward a specific shape. Schema enforcement is best-effort — validate client-side if you need strict guarantees.
The response gains a top-level answer_jsonfield with the parsed value (object / array / scalar). It's null when the model returns malformed JSON despite the request; answer always carries the raw string.
{
"query": "List Apple's active legal proceedings",
"buckets": ["edgar_AAPL"],
"options": {
"return_format": "json",
"response_schema": {
"type": "array",
"items": {
"properties": {
"case_name": {"type": "string"},
"jurisdiction": {"type": "string"},
"status": {"type": "string"}
}
}
}
}
}Response (with return_format: "json"):
{
"success": true,
"answer": "[ { \"case_name\": ... } ]", // raw string
"answer_json": [ // parsed
{"case_name": "Epic v Apple", "jurisdiction": "9th Circuit", "status": "remanded"},
...
],
"memories_found": 14,
...
}Streaming Queries
The synthesis step for broad or multi-bucket queries can run 10–25 seconds. Pass "stream": true in the request body to receive the answer as Server-Sent Events while it's being generated, so you can render tokens as they arrive instead of waiting for the full response.
Request
Identical to a normal query, plus "stream": true:
curl -N -X POST https://api.lumetra.io/v1/query \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"query": "Summarize what I worked on this week",
"buckets": ["work"],
"stream": true
}'curl -N disables curl's output buffering so the chunks print as they arrive.
Response format
The response is text/event-stream. Each event is a line beginning with data: followed by a JSON payload. Events fall into three shapes:
- Delta chunk — OpenAI-style; one per token-ish slice of the answer:
data: {"choices":[{"delta":{"content":"Apple"}}]} data: {"choices":[{"delta":{"content":" reported"}}]} - Done frame — emitted exactly once with the full answer, the flat retrieval payload, and usage:
data: {"type":"done","answer":"...","memories_found":3,"memories":[...],"graph_facts":[...],"entity_matches":[...],"context_tokens":2029,"usage":{"input_tokens":8,"output_tokens":134,"actual_input_tokens":2037,"actual_output_tokens":134},"synthesis_usage":{"input_tokens":2037,"output_tokens":134,"total_tokens":2171}} - Tail frame — follows the done frame with any post-synthesis additions (currently
bucket_profileswhen a Bucket Profiler agent has generated one for a queried bucket):data: {"usage":{...},"bucket_profiles":{"default":"..."}} - Terminator — signals end-of-stream:
data: [DONE]
The server may also emit SSE comments (lines starting with:) as keep-alive pings before the first delta lands — a prelude immediately after the headers and a heartbeat every 2 seconds during the retrieval and extractor phases. SSE parsers ignore comments, so you only need to handle them if you're parsing the wire format by hand.
First-byte latency
Streaming flushes HTTP headers and a prelude within a TCP RTT. The first content delta typically lands 3–10 seconds later, gated by retrieval and the extractor pass. Total time to full answer is comparable to non-streaming; the win is perceived responsiveness.
SDK examples
Each official client surfaces streaming as an iterator/async-iterator:
# Python (lumetra-engram >= 0.2.0)
from lumetra_engram import EngramClient
engram = EngramClient(api_key=os.environ["ENGRAM_API_KEY"])
for event in engram.query_stream("Summarize this week", buckets=["work"]):
if event["type"] == "delta":
print(event["content"], end="", flush=True)
elif event["type"] == "done":
print()
print(f"Used {event['usage']['output_tokens']} tokens")// TypeScript (@lumetra/engram >= 0.3.0)
import { EngramClient } from '@lumetra/engram';
const engram = new EngramClient({ apiKey: process.env.ENGRAM_API_KEY });
for await (const event of engram.queryStream('Summarize this week', { buckets: ['work'] })) {
if (event.type === 'delta') process.stdout.write(event.content);
else if (event.type === 'done') console.log('\nusage:', event.usage);
}// Go (github.com/lumetra-io/engram-go >= v0.3.0)
stream, err := client.QueryStream(ctx, "Summarize this week",
engram.QueryOptions{Buckets: []string{"work"}})
if err != nil { return err }
defer stream.Close()
for stream.Next() {
ev := stream.Event()
switch ev.Type {
case "delta":
fmt.Print(ev.Content)
case "done":
fmt.Println("\nusage:", ev.Usage)
}
}
return stream.Err()Timeouts
Each SDK has a separate stream timeout (default 5 minutes) distinct from the 30-second buffered timeout. Override viastream_timeout_seconds (Python),streamTimeoutMs (JS/TS), orStreamTimeout (Go) in the client constructor.
Error Handling
The API uses standard HTTP status codes:
| Status | Meaning |
|---|---|
| 200 | Success |
| 201 | Created |
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid or missing API key |
| 404 | Not Found - Resource doesn't exist |
| 429 | Rate Limited - Too many requests |
| 500 | Server Error |
Error responses include a JSON body:
{
"error": "Invalid API key"
}SDKs & Libraries
Use one of the official clients rather than hand-rolling HTTP. All three wrap the REST API documented above, ship full type hints, and have zero runtime dependencies.
Python
pip install lumetra-engram. Installation, quickstart, and the full method surface live on the repo and PyPI — kept in one place so the docs and the package can't drift.
JavaScript / TypeScript
npm install @lumetra/engram. ESM + CJS, full .d.ts, runs on Node 18+, Bun, Deno, and edge runtimes. Quickstart and reference live on the repo and npm.
Go
go get github.com/lumetra-io/engram-go. Go 1.21+, stdlib-only, context.Context on every method, safe for concurrent use. Quickstart and reference live on the repo and pkg.go.dev.