Integration

Long-term memory for LlamaIndex

LlamaIndex's built-in chat-history buffer is in-process and capped at a recent window. `EngramMemory` is a `BaseMemory` that replaces it. Every message persists to a bucket, recent messages come back chronologically, and `query()` runs hybrid retrieval over the full history when the window isn't enough.

Get an API key (free) All integrations →

Install

Three steps: sign up for an Engram API key, paste a BYOK LLM-provider key on /models, then drop the snippet below into LlamaIndex.

Three steps to memory in your agent

Sign up. Free, no card. You'll land on a Getting Started page that walks the next two steps.
Add your LLM key. Engram is BYOK. Paste an OpenAI / Anthropic / Groq / Together / Fireworks key and we'll route every extraction and query call through your provider. You pay your provider directly. We never see your inference.
Paste the snippet below into your agent and restart it. Use Authorization: Bearer <api-key>with the API key from your portal.

llama-index-memory-engram: durable BaseMemory

EngramMemory is a BaseMemory implementation that replaces LlamaIndex's chat-history buffer. Recent-window .get() for prompt-stuffing; hybrid retrieval via .query() for full-bucket recall. Source: github.com/lumetra-io/engram-llamaindex.

Install:

Terminal

pip install llama-index-memory-engram

Export your API key:

Terminal

export ENGRAM_API_KEY="<api-key>"

Pass EngramMemory into any agent's .run():

Python

from llama_index.llms.openai import OpenAI
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.memory.engram import EngramMemory

memory = EngramMemory.from_defaults(bucket="user-42", read_limit=50)

agent = FunctionAgent(llm=OpenAI("gpt-4o"), tools=[...])

response = await agent.run(
    "What did we decide about the Q3 launch?",
    memory=memory,
)

What you can do once memory's wired in

Pass at runtime to `FunctionAgent` or `ReActAgent` via `agent.run(query, memory=EngramMemory(...))`. Memory is a per-call runtime argument, not a constructor argument
Set per-user buckets with `EngramMemory(bucket=f'user-{user_id}')`, one line per tenant scope
Call `.query()` for hybrid retrieval across the entire bucket, and reserve `.get()` for the recent-window slice
Combine with LlamaIndex's other components (vector indexes, query engines). Engram replaces the chat memory tier specifically

FAQ

Does `EngramMemory` work with `ChatEngine` and `Agent` both?

Yes. Both consume `BaseMemory`. The Engram implementation behaves like the built-in one for `.put()` and `.get()`, and adds `.query()` for semantic recall.

How do I limit how much history `get()` returns?

`EngramMemory.from_defaults(read_limit=50)` caps the recent-window. The default is usually what you want for prompt-stuffing; bump it for longer context windows.

Self-hosted Engram?

Pass `base_url='https://engram.internal.example.com'` to the constructor. Same six tools, your endpoint.

Related integrations

Ship durable memory in LlamaIndex today

Free tier: 10K memories and 50K retrievals per month. No credit card. Same Engram backend powers all 41 integrations, so memories you write from one client are immediately queryable from the rest.

Start free See pricing