Integrate Engram into your applications.

REST endpoints and the MCP server. Auth, memory storage, retrieval with explanations, and bucket management.

Getting Started

Engram provides a REST API for storing and querying memories. All API requests require authentication using an API key.

Official client libraries

Zero-dependency clients with full type hints for Python, TypeScript, and Go:

All three libraries wrap the same REST API documented below. Use them if you'd rather not hand-roll HTTP calls.

Codex plugin

Add this entry to .agents/plugins/marketplace.json in your repo, then install via the Codex UI. Codex will prompt for your ENGRAM_API_KEY on install and store it in its secret store.

{
  "name": "engram-plugins",
  "interface": { "displayName": "Engram Plugins" },
  "plugins": [
    {
      "name": "engram",
      "source": { "source": "local", "path": "./plugins/engram" },
      "policy": { "installation": "AVAILABLE", "authentication": "ON_INSTALL" },
      "category": "Productivity"
    }
  ]
}

The plugin folder itself lives at github.com/lumetra-io/engram-codex-plugin. Copy plugins/engram/ from there into your repo's plugins/ directory (the marketplace entry's source.path points at the local copy). Once installed, your agent gets store_memory, query_memory,list_memories, and the rest of the Engram tool surface.

Base URL

https://api.lumetra.io

Authentication

Include your API key in the Authorization header:

curl -X POST https://api.lumetra.io/v1/buckets/default/memories \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "Alice works at TechCorp"}'

Managing API Keys: Create and revoke API keys from your Lumetra dashboard after signing in. API keys cannot be managed via the API for security reasons.

Quick Example

Store a memory and query it:

# Store a memory
curl -X POST https://api.lumetra.io/v1/buckets/work/memories \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "Bob is the CEO of Acme Inc"}'

# Query your memories
curl -X POST https://api.lumetra.io/v1/query \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "Who is the CEO of Acme?", "buckets": ["work"]}'

MCP Server

Engram provides an MCP (Model Context Protocol) server for direct integration with Claude and other AI assistants. The MCP server exposes memory tools that Claude can use to store and retrieve information.

Claude Code (CLI)

Add the MCP server with a single command:

claude mcp add-json engram '{"type":"sse","url":"https://mcp.lumetra.io/mcp/sse","headers":{"Authorization":"Bearer YOUR_API_KEY"}}'

Claude Desktop / Windsurf / Cursor

Add to your MCP configuration file:

{
  "mcpServers": {
    "engram": {
      "url": "https://mcp.lumetra.io/mcp/sse",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    }
  }
}

Available MCP Tools

ToolDescription
store_memory(content, bucket?)Store a fact or piece of information
query_memory(question, bucket?)Search memories using natural language with AI synthesis
list_memories(bucket, limit?)List all memories stored in a bucket
list_buckets()List available memory buckets
delete_memory(memory_id, bucket)Delete a specific memory by ID
clear_memories(bucket)Clear all memories in a bucket

MCP Tool Responses

Heads up — MCP and REST use different field names for the same data. The MCP wrapper predates the REST API and we haven't unified the field names (renaming MCP fields would break every agent configuration in the wild). If you call both surfaces from the same client, normalize on the wire.

What it isREST fieldMCP field
Memory primary key (top-level)idmemory_id
Bucket identifier on responsesbucket_namebucket
Count of cleared memoriescleared_countmemories_deleted

Inside the retrieval payload (REST: memories[], MCP: retrieved_memories[]) andgraph_facts[], both surfaces use the same field name (memory_id) — the array names are the only difference, and SDKs normalize to the REST shape.

Only direct MCP tool calls (e.g. from Claude.ai connectors) see the MCP-flavored field names.

store_memory (MCP) returns:

{
  "success": true,
  "memory_id": "uuid",
  "bucket": "default",
  "token_count": 8,
  "config_id": "default",
  "extractor_usage": {
    "input_tokens": 32,
    "output_tokens": 18
  }
}

// extractor_usage is null when triple extraction didn't run for this
// memory (BYOK extractor not configured, call failed, or no extractable
// facts). Memory is still stored and queryable in either case.

clear_memories (MCP) returns:

{
  "success": true,
  "memories_deleted": 42,
  "bucket": "default"
}

// REST equivalent: DELETE /v1/buckets/{bucket}/memories returns the
// same count under "cleared_count" (not "memories_deleted").

query_memory returns:

{
  "success": true,
  "answer": "You prefer Python over JavaScript.",
  "memories_found": 1,
  "retrieved_memories": [
    {
      "memory_id": "mem_8a3f...",
      "bucket_id": "buc_d12c...",
      "bucket_name": "default",
      "content": "User prefers Python over JavaScript for new services.",
      "raw_score": 0.94,
      "weight": 1.0,
      "weighted_score": 0.94
    }
  ],
  "graph_facts": [
    {
      "subject": "User",
      "predicate": "prefers",
      "object": "Python",
      "memory_id": "mem_8a3f...",
      "bucket_id": "buc_d12c...",
      "bucket_name": "default",
      "depth": 0,
      "weight": 1.0,
      "timestamp": "2026-05-20T14:32:00Z"
    }
  ],
  "entity_matches": [
    { "entity": "Python", "bucket_name": "default", "score": 1.0 }
  ],
  "context_tokens": 42,
  "usage": {
    "input_tokens": 6,
    "output_tokens": 8
  }
}

// Each graph_facts[i].memory_id matches a retrieved_memories[j].memory_id —
// use that link to cite the source memory behind any fact.

Example Usage

Once connected, Claude can use memory tools automatically:

User: "Remember that I prefer Python over JavaScript"

Claude: [calls store_memory]
  content: "User prefers Python over JavaScript"
  bucket: "default"

Result: Memory stored successfully

---

User: "What are my programming preferences?"

Claude: [calls query_memory]
  question: "What are the user's programming preferences?"

Result: Based on stored memories, you prefer Python over JavaScript.

Memories

Store, list, and manage memories. All endpoints require API key authentication.

GET/v1/buckets/{bucket}/memoriesAuth Required

List memories in a bucket with pagination. Returns newest first. 404s if the bucket does not exist (use POST /v1/buckets or store a memory first to auto-create).

+
POST/v1/buckets/{bucket}/memoriesAuth Required

Store a new memory in a bucket

+
DELETE/v1/buckets/{bucket}/memories/{memory_id}Auth Required

Delete a specific memory by ID

+
DELETE/v1/buckets/{bucket}/memoriesAuth Required

Clear all memories in a bucket

+
DELETE/v1/buckets/{bucket}Auth Required

Delete a bucket entirely (and every memory in it). Destructive — no undo.

+
POST/v1/queryAuth Required

Query memories using natural language with AI-powered synthesis. Pass "stream": true to receive the answer as Server-Sent Events as the synthesizer produces it — see the Streaming Queries section below. Bucket scoping: omitting `buckets` defaults to `["default"]` (auto-created on next write); passing `[]` returns 400; passing names that don't exist returns 404 with the missing bucket(s) listed.

+

Buckets

Organize memories into separate namespaces called buckets.

GET/v1/bucketsAuth Required

List all buckets for your account

+
POST/v1/bucketsAuth Required

Create a new bucket. `name` is required; `description` is optional. Idempotent: posting the same name twice returns 200 with the existing bucket (and preserves the original description) instead of 201 / 409. Names starting with `_` are reserved for system meta-buckets and return 403.

+
GET/v1/buckets/{bucket}/profileAuth Required

Inspect the canonical profile for a bucket. The profile is generated by the Bucket Profiler agent and prepended to every query against this bucket to improve recall. Status is `not_installed` until you call `/profile/ensure-agent`, then `pending` while the first tick is in flight, then `ready` once a profile has been written.

+
POST/v1/buckets/{bucket}/profile/ensure-agentAuth Required

Idempotent install of the Bucket Profiler agent + attach it to this bucket. Pass `?run_now=true` to fire an immediate tick. Use this before /profile/regenerate; without it the regenerate endpoint 412s.

+
POST/v1/buckets/{bucket}/profile/regenerateAuth Required

Force a fresh profile generation by triggering the installed Bucket Profiler agent. Returns 412 with `BUCKET_PROFILER_NOT_INSTALLED` if no agent is attached to this bucket — call `/profile/ensure-agent` first. The tick runs in the background; poll GET /profile to see status flip from `pending` to `ready`.

+

Usage & Stats

Monitor your usage and resource statistics.

GET/v1/usageAuth Required

Get token usage statistics

+
GET/v1/statsAuth Required

Get resource statistics

+

Agents

Install and manage Memory Agents from the template library. Agents run on a schedule and operate on your memory. Read-only operators (Bucket Profiler, Logger) write findings into their own meta-bucket — useful telemetry, but nothing in your real buckets changes. Mutation operators (Janitor, Consolidator, Watchdog) propose or apply changes to actual memories; every one of those mutations is audit-logged and reversible from /v1/audit.

GET/v1/templatesAuth Required

List available agent templates (Profiler, Watchdog, Logger, Janitor, Consolidator).

+
GET/v1/templates/{template_id}Auth Required

Get full spec for a single template (operators, params, permissions, retention). Returns 404 if the template_id is unknown.

+
GET/v1/agentsAuth Required

List installed agents and their attachment / tick state.

+
POST/v1/agentsAuth Required

Install an agent from a template. Both `template_id` and `name` are required.

+
GET/v1/agents/{agent_id}Auth Required

Get a single agent with full spec + attachments.

+
PATCH/v1/agents/{agent_id}Auth Required

Toggle enabled, upgrade to latest template version, or replace the spec.

+
DELETE/v1/agents/{agent_id}Auth Required

Uninstall the agent. Default is `?findings=archive`, which renames the meta-bucket so historical findings stay queryable. Pass `?findings=delete` to drop the meta-bucket and every finding in it — destructive and irreversible.

+
POST/v1/agents/{agent_id}/tickAuth Required

Manually run a tick for one of the agent's attached buckets. Returns the tick row + persisted findings count.

+
GET/v1/agents/{agent_id}/ticksAuth Required

Recent ticks (newest first). `?limit=20` default.

+
GET/v1/agents/{agent_id}/attachmentsAuth Required

List buckets this agent runs against, with cron interval + last-tick state.

+
POST/v1/agents/{agent_id}/attachmentsAuth Required

Attach the agent to a bucket on a schedule. The bucket can be passed as a UUID or a name (aliases: `bucket_id`, `bucket_name`, `bucket`). `interval` accepts a cron expression or `@every:Ns/m/h/d`; pass `"manual"` to disable auto-ticks.

+
PATCH/v1/agents/{agent_id}/attachments/{attachment_id}Auth Required

Pause/resume or change the interval for a specific attachment.

+
DELETE/v1/agents/{agent_id}/attachments/{attachment_id}Auth Required

Detach the agent from a bucket. Findings already written remain in the meta-bucket.

+

Audit Log and Rollback

Every mutation (memory create/delete/update/expire/merge/reassign, bucket delete) by both users and agents is recorded in the audit log. Roll back any single event or cascade through a chain of changes. Reads on /v1/memories/{id}/at let you query the corpus at any point in the past.

GET/v1/auditAuth Required

Recent audit events, newest first. Filterable by actor type, memory id, bucket id, tick id, and time range — same filters the portal /history view uses, so a custom audit UI can rebuild that view in full.

+
GET/v1/audit/{event_id}Auth Required

Single event detail plus a conflict preview (newer events touching the same target).

+
POST/v1/audit/{event_id}/rollbackAuth Required

Apply the inverse of an audit event. Refuses with 409 + conflicts if a newer event touches the same target. Pass `force=true` to override (lossy) or `cascade=true` to roll back the newer events first.

+
POST/v1/audit/groups/{parent_event_id}/rollbackAuth Required

Roll back an event AND all of its children (the per-source events for memory_merge, etc.) in one transactional sweep.

+

Point-in-Time and History

Versioned columns under the hood let you query the state of any memory at a past timestamp, and read a full per-bucket changelog feed.

GET/v1/memories/{memory_id}/atAuth Required

Resolve a memory's state at the timestamp passed in `?as_of=` (ISO 8601, required). Returns the archived version if one covers that time, otherwise the current state if the timestamp is post-creation, otherwise 404.

+
GET/v1/buckets/{bucket}/historyAuth Required

Audit events touching a bucket, newest first. Useful for "what happened in this bucket" support views. The path accepts a UUID or a bucket name, like the rest of the /v1/buckets/* routes.

+

Dedup

Naming note: "strict" isstricter about what counts as a duplicate, not "more aggressive deduplication." It raises the similarity bar to 0.99 so only near-identical writes collapse — i.e. it preserves more memories than the default."loose" dedupes more aggressively at 0.95.

The store path runs a similarity check before writing. By default ("loose", similarity ≥ 0.95) it collapses near-duplicate writes into the existing memory so re-ingesting the same source doesn't bloat the bucket. For narrative content this is usually what you want.

For templated time-series content (financial filings, daily metrics, log rows) where rows are structurally similar but each carries unique values, the default collapses real data. Pass "dedup": "off" on the store request to disable.

Detecting silent merges

Every store response includes a status field —"stored" for a fresh write,"merged" when the server absorbed the write into an existing memory. The merged response includes three extra fields:

{
  "id": "ec59...",            // the canonical id (= deduped_into)
  "bucket_name": "prices_AAPL",
  "token_count": 222,
  "status": "merged",
  "deduped_into": "ec59...",
  "similarity_score": 0.987,
  "merge_reason": "embedding_similarity"
}

merge_reason is one of:content_hash (byte-identical content already stored), embedding_similarity (vector match ≥ threshold), conflict_keep_existing (LLM conflict resolver chose the existing memory), orconcurrent_insert_race (another worker stored identical content first).

Policies

dedupBehaviorUse when
"off"Skip both the byte-hash and similarity checks. Every write stores a new memory.Bulk ingest of templated time-series rows. Re-importing the same source on purpose.
"loose" (default)Collapse writes at similarity ≥ 0.95.Narrative content where structurally similar chunks really are redundant.
"strict"Only collapse near-identical writes (≥ 0.99).Keep a safety net against exact re-ingest while allowing near-identical but distinct content (e.g. month-over-month metrics).

Example

curl -X POST https://api.lumetra.io/v1/buckets/prices_AAPL/memories \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "AAPL 2024-01 OHLC: 187.15/195.62/178.85/188.47", "dedup": "off"}'

Billing note

When dedup fires we skip the triple-extractor pass, so a merged write doesn't cost extraction tokens. Setting"dedup": "off" means more memories store → more extractor runs → more BYOK usage. Use theextractor_usage field in each response to audit.

Query Knobs

POST /v1/query accepts these tuning options in the options object (all optional). They compose with each other and with stream.

top_k_per_bucket

Per-bucket retrieval depth. Either an integer (same K for every bucket) or an object mapping bucket names to K. Missing buckets fall back to the global top_k. Built for asymmetric bucket sizes — e.g. a 1900-memory filings bucket paired with a 6-memory monthly-prices bucket, where a uniform K either starves the big one or wastes slots on the small one.

"options": {
  "top_k": 8,
  "top_k_per_bucket": {
    "edgar_AAPL": 20,
    "prices_AAPL": 4,
    "patents_AAPL": 5
  }
}

max_tokens

Cap synthesis output. The default (8192) is generous so Qwen3 / GPT-5 reasoning traces don't truncate. Lower it when running agent loops, embedding answers in pipelines with strict context budgets, or controlling cost.

"options": { "max_tokens": 400 }

min_weighted_score (recommended) / min_similarity_threshold

Two precision floors with different scales. Most callers want min_weighted_scorebecause it filters on the same number you see inmemories[*].weighted_score.

KnobOperates onTypical valuesWhen to use
min_weighted_scorePost-RRF weighted score (visible in response)0.0 = positive matches only; higher = stricterDefault choice. Citations-grade retrieval, "least-bad" guard.
min_similarity_thresholdRaw cosine similarity (embedding score)0.10.5 typicallyYou specifically want a floor on the underlying embedding signal, not the RRF blend. Rarely needed.
"options": {
  "min_weighted_score": 0.0     // keep only positive-score chunks
}

return_format + response_schema

"return_format": "json" asks the synthesizer for a JSON response. The provider's JSON-object mode is used when available; for providers without native support the prompt instructs strict JSON. Optionally pass"response_schema" (a JSON Schema dict) to guide the model toward a specific shape. Schema enforcement is best-effort — validate client-side if you need strict guarantees.

The response gains a top-level answer_jsonfield with the parsed value (object / array / scalar). It's null when the model returns malformed JSON despite the request; answer always carries the raw string.

{
  "query": "List Apple's active legal proceedings",
  "buckets": ["edgar_AAPL"],
  "options": {
    "return_format": "json",
    "response_schema": {
      "type": "array",
      "items": {
        "properties": {
          "case_name":    {"type": "string"},
          "jurisdiction": {"type": "string"},
          "status":       {"type": "string"}
        }
      }
    }
  }
}

Response (with return_format: "json"):

{
  "success": true,
  "answer": "[ { \"case_name\": ... } ]",   // raw string
  "answer_json": [                        // parsed
    {"case_name": "Epic v Apple", "jurisdiction": "9th Circuit", "status": "remanded"},
    ...
  ],
  "memories_found": 14,
  ...
}

Streaming Queries

The synthesis step for broad or multi-bucket queries can run 10–25 seconds. Pass "stream": true in the request body to receive the answer as Server-Sent Events while it's being generated, so you can render tokens as they arrive instead of waiting for the full response.

Request

Identical to a normal query, plus "stream": true:

curl -N -X POST https://api.lumetra.io/v1/query \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "query": "Summarize what I worked on this week",
    "buckets": ["work"],
    "stream": true
  }'

curl -N disables curl's output buffering so the chunks print as they arrive.

Response format

The response is text/event-stream. Each event is a line beginning with data: followed by a JSON payload. Events fall into three shapes:

  • Delta chunk — OpenAI-style; one per token-ish slice of the answer:
    data: {"choices":[{"delta":{"content":"Apple"}}]}
    data: {"choices":[{"delta":{"content":" reported"}}]}
  • Done frame — emitted exactly once with the full answer, the flat retrieval payload, and usage:
    data: {"type":"done","answer":"...","memories_found":3,"memories":[...],"graph_facts":[...],"entity_matches":[...],"context_tokens":2029,"usage":{"input_tokens":8,"output_tokens":134,"actual_input_tokens":2037,"actual_output_tokens":134},"synthesis_usage":{"input_tokens":2037,"output_tokens":134,"total_tokens":2171}}
  • Tail frame — follows the done frame with any post-synthesis additions (currently bucket_profileswhen a Bucket Profiler agent has generated one for a queried bucket):
    data: {"usage":{...},"bucket_profiles":{"default":"..."}}
  • Terminator — signals end-of-stream:
    data: [DONE]

The server may also emit SSE comments (lines starting with:) as keep-alive pings before the first delta lands — a prelude immediately after the headers and a heartbeat every 2 seconds during the retrieval and extractor phases. SSE parsers ignore comments, so you only need to handle them if you're parsing the wire format by hand.

First-byte latency

Streaming flushes HTTP headers and a prelude within a TCP RTT. The first content delta typically lands 3–10 seconds later, gated by retrieval and the extractor pass. Total time to full answer is comparable to non-streaming; the win is perceived responsiveness.

SDK examples

Each official client surfaces streaming as an iterator/async-iterator:

# Python (lumetra-engram >= 0.2.0)
from lumetra_engram import EngramClient

engram = EngramClient(api_key=os.environ["ENGRAM_API_KEY"])
for event in engram.query_stream("Summarize this week", buckets=["work"]):
    if event["type"] == "delta":
        print(event["content"], end="", flush=True)
    elif event["type"] == "done":
        print()
        print(f"Used {event['usage']['output_tokens']} tokens")
// TypeScript (@lumetra/engram >= 0.3.0)
import { EngramClient } from '@lumetra/engram';

const engram = new EngramClient({ apiKey: process.env.ENGRAM_API_KEY });
for await (const event of engram.queryStream('Summarize this week', { buckets: ['work'] })) {
  if (event.type === 'delta') process.stdout.write(event.content);
  else if (event.type === 'done') console.log('\nusage:', event.usage);
}
// Go (github.com/lumetra-io/engram-go >= v0.3.0)
stream, err := client.QueryStream(ctx, "Summarize this week",
    engram.QueryOptions{Buckets: []string{"work"}})
if err != nil { return err }
defer stream.Close()
for stream.Next() {
    ev := stream.Event()
    switch ev.Type {
    case "delta":
        fmt.Print(ev.Content)
    case "done":
        fmt.Println("\nusage:", ev.Usage)
    }
}
return stream.Err()

Timeouts

Each SDK has a separate stream timeout (default 5 minutes) distinct from the 30-second buffered timeout. Override viastream_timeout_seconds (Python),streamTimeoutMs (JS/TS), orStreamTimeout (Go) in the client constructor.

Error Handling

The API uses standard HTTP status codes:

StatusMeaning
200Success
201Created
400Bad Request - Invalid parameters
401Unauthorized - Invalid or missing API key
404Not Found - Resource doesn't exist
429Rate Limited - Too many requests
500Server Error

Error responses include a JSON body:

{
  "error": "Invalid API key"
}

SDKs & Libraries

Use one of the official clients rather than hand-rolling HTTP. All three wrap the REST API documented above, ship full type hints, and have zero runtime dependencies.

Python

pip install lumetra-engram. Installation, quickstart, and the full method surface live on the repo and PyPI — kept in one place so the docs and the package can't drift.

JavaScript / TypeScript

npm install @lumetra/engram. ESM + CJS, full .d.ts, runs on Node 18+, Bun, Deno, and edge runtimes. Quickstart and reference live on the repo and npm.

Go

go get github.com/lumetra-io/engram-go. Go 1.21+, stdlib-only, context.Context on every method, safe for concurrent use. Quickstart and reference live on the repo and pkg.go.dev.