Integration

Long-term memory for LlamaIndex

LlamaIndex's built-in chat-history buffer is in-process and capped at a recent window. `EngramMemory` is a `BaseMemory` that replaces it. Every message persists to a bucket, recent messages come back chronologically, and `query()` runs hybrid retrieval over the full history when the window isn't enough.

Install

Three steps: sign up for an Engram API key, paste a BYOK LLM-provider key on /models, then drop the snippet below into LlamaIndex.

Three steps to memory in your agent

  1. Sign up. Free, no card. You'll land on a Getting Started page that walks the next two steps.
  2. Add your LLM key. Engram is BYOK. Paste an OpenAI / Anthropic / Groq / Together / Fireworks key and we'll route every extraction and query call through your provider. You pay your provider directly. We never see your inference.
  3. Paste the snippet below into your agent and restart it. Use Authorization: Bearer <api-key>with the API key from your portal.

llama-index-memory-engram: durable BaseMemory

EngramMemory is a BaseMemory implementation that replaces LlamaIndex's chat-history buffer. Recent-window .get() for prompt-stuffing; hybrid retrieval via .query() for full-bucket recall. Source: github.com/lumetra-io/engram-llamaindex.

  1. Install:
  2. Terminal
    pip install llama-index-memory-engram
  3. Export your API key:
  4. Terminal
    export ENGRAM_API_KEY="<api-key>"
  5. Pass EngramMemory into any agent's .run():
  6. Python
    from llama_index.llms.openai import OpenAI
    from llama_index.core.agent.workflow import FunctionAgent
    from llama_index.memory.engram import EngramMemory
    
    memory = EngramMemory.from_defaults(bucket="user-42", read_limit=50)
    
    agent = FunctionAgent(llm=OpenAI("gpt-4o"), tools=[...])
    
    response = await agent.run(
        "What did we decide about the Q3 launch?",
        memory=memory,
    )

What you can do once memory's wired in

  • Pass at runtime to `FunctionAgent` or `ReActAgent` via `agent.run(query, memory=EngramMemory(...))`. Memory is a per-call runtime argument, not a constructor argument
  • Set per-user buckets with `EngramMemory(bucket=f'user-{user_id}')`, one line per tenant scope
  • Call `.query()` for hybrid retrieval across the entire bucket, and reserve `.get()` for the recent-window slice
  • Combine with LlamaIndex's other components (vector indexes, query engines). Engram replaces the chat memory tier specifically

FAQ

Does `EngramMemory` work with `ChatEngine` and `Agent` both?

Yes. Both consume `BaseMemory`. The Engram implementation behaves like the built-in one for `.put()` and `.get()`, and adds `.query()` for semantic recall.

How do I limit how much history `get()` returns?

`EngramMemory.from_defaults(read_limit=50)` caps the recent-window. The default is usually what you want for prompt-stuffing; bump it for longer context windows.

Self-hosted Engram?

Pass `base_url='https://engram.internal.example.com'` to the constructor. Same six tools, your endpoint.

Ship durable memory in LlamaIndex today

Free tier: 10K memories and 50K retrievals per month. No credit card. Same Engram backend powers all 41 integrations, so memories you write from one client are immediately queryable from the rest.