Integration
Long-term memory for LlamaIndex
LlamaIndex's built-in chat-history buffer is in-process and capped at a recent window. `EngramMemory` is a `BaseMemory` that replaces it. Every message persists to a bucket, recent messages come back chronologically, and `query()` runs hybrid retrieval over the full history when the window isn't enough.
Install
Three steps: sign up for an Engram API key, paste a BYOK LLM-provider key on /models, then drop the snippet below into LlamaIndex.
Three steps to memory in your agent
- Sign up. Free, no card. You'll land on a Getting Started page that walks the next two steps.
- Add your LLM key. Engram is BYOK. Paste an OpenAI / Anthropic / Groq / Together / Fireworks key and we'll route every extraction and query call through your provider. You pay your provider directly. We never see your inference.
- Paste the snippet below into your agent and restart it. Use
Authorization: Bearer <api-key>with the API key from your portal.
llama-index-memory-engram: durable BaseMemory
EngramMemory is a BaseMemory implementation that replaces LlamaIndex's chat-history buffer. Recent-window .get() for prompt-stuffing; hybrid retrieval via .query() for full-bucket recall. Source: github.com/lumetra-io/engram-llamaindex.
- Install:
- Export your API key:
- Pass
EngramMemoryinto any agent's.run():
pip install llama-index-memory-engramexport ENGRAM_API_KEY="<api-key>"from llama_index.llms.openai import OpenAI
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.memory.engram import EngramMemory
memory = EngramMemory.from_defaults(bucket="user-42", read_limit=50)
agent = FunctionAgent(llm=OpenAI("gpt-4o"), tools=[...])
response = await agent.run(
"What did we decide about the Q3 launch?",
memory=memory,
)What you can do once memory's wired in
- Pass at runtime to `FunctionAgent` or `ReActAgent` via `agent.run(query, memory=EngramMemory(...))`. Memory is a per-call runtime argument, not a constructor argument
- Set per-user buckets with `EngramMemory(bucket=f'user-{user_id}')`, one line per tenant scope
- Call `.query()` for hybrid retrieval across the entire bucket, and reserve `.get()` for the recent-window slice
- Combine with LlamaIndex's other components (vector indexes, query engines). Engram replaces the chat memory tier specifically
FAQ
Does `EngramMemory` work with `ChatEngine` and `Agent` both?
Yes. Both consume `BaseMemory`. The Engram implementation behaves like the built-in one for `.put()` and `.get()`, and adds `.query()` for semantic recall.
How do I limit how much history `get()` returns?
`EngramMemory.from_defaults(read_limit=50)` caps the recent-window. The default is usually what you want for prompt-stuffing; bump it for longer context windows.
Self-hosted Engram?
Pass `base_url='https://engram.internal.example.com'` to the constructor. Same six tools, your endpoint.
Related integrations
Ship durable memory in LlamaIndex today
Free tier: 10K memories and 50K retrievals per month. No credit card. Same Engram backend powers all 41 integrations, so memories you write from one client are immediately queryable from the rest.