Integration

Long-term memory for LiveKit Agents

LiveKit Agents are stateless by design. Every voice call starts blank, the caller re-explains themselves, and the agent loses continuity. `livekit-plugins-engram` adds a one-line memory backend so the agent remembers what the caller told you last Tuesday, recalls it mid-sentence, and explains *why* it surfaced any given fact.

Install

Three steps: sign up for an Engram API key, paste a BYOK LLM-provider key on /models, then drop the snippet below into LiveKit Agents.

Three steps to memory in your agent

  1. Sign up. Free, no card. You'll land on a Getting Started page that walks the next two steps.
  2. Add your LLM key. Engram is BYOK. Paste an OpenAI / Anthropic / Groq / Together / Fireworks key and we'll route every extraction and query call through your provider. You pay your provider directly. We never see your inference.
  3. Paste the snippet below into your agent and restart it. Use Authorization: Bearer <api-key>with the API key from your portal.

livekit-plugins-engram: voice + realtime AI

One-line memory backend for LiveKit Agent. Use from lifecycle hooks (record what the user just said, recall relevant context before the LLM responds) or hand the model function tools. Source: github.com/lumetra-io/engram-livekit.

  1. Install:
  2. Terminal
    pip install livekit-plugins-engram
  3. Export your API key:
  4. Terminal
    export ENGRAM_API_KEY="<api-key>"
  5. Pattern 1: record + recall in on_user_turn_completed:
  6. Python
    from livekit.agents import Agent
    from livekit.plugins.engram import Engram
    
    memory = Engram(bucket="caller-default")
    
    class Receptionist(Agent):
        async def on_user_turn_completed(self, chat_ctx, new_message):
            text = new_message.text_content
            if not text: return
            await memory.astore_memory(text)
            recall = await memory.aquery_memory(text)
            if recall.get("answer"):
                chat_ctx.add_message(role="system", content=f"Relevant memory: {recall['answer']}")

What you can do once memory's wired in

  • Voice receptionist that remembers caller preferences across separate calls
  • Realtime customer-support agent with memory of prior tickets and decisions
  • Voice assistant for a single user across sessions: a personal assistant that remembers you
  • Multi-tenant voice deployment where each customer's bucket is isolated by `bucket=f'caller-{caller_id}'`

FAQ

Pattern 1 (lifecycle hooks) or Pattern 2 (model tools), which should I use?

Pattern 1 is automatic: every user turn gets recorded and recalled without the model having to call a tool. Pattern 2 gives the model control over store and recall. Combine them if you want both a passive transcript and active curation.

Does this work with the realtime API models (GPT-4o realtime, Gemini live)?

Yes, with a caveat: per LiveKit's docs, to use `on_user_turn_completed` with a realtime model you must configure turn detection to run **in your agent** instead of inside the realtime model. The plugin's lifecycle hooks attach to `Agent` either way, so just keep that turn-detection config in mind when setting up a realtime pipeline.

What about privacy when voice transcripts go into memory?

Treat the bucket like any voice-recording store. Engram's data isn't used to train models, but you should align retention with your voice-recording policy.

Ship durable memory in LiveKit Agents today

Free tier: 10K memories and 50K retrievals per month. No credit card. Same Engram backend powers all 41 integrations, so memories you write from one client are immediately queryable from the rest.