Integration

Long-term memory for Pipecat

Pipecat is the open-source framework for voice and realtime AI pipelines (STT → LLM → TTS with frame-level orchestration), and every call starts blank. `EngramMemoryProcessor` is a `FrameProcessor` that sits between the user input and the LLM context aggregator: it stores every finalized user turn, and it injects relevant prior memories as a silent `skip_tts=True` `TextFrame` right before each LLM call.

Install

Three steps: sign up for an Engram API key, paste a BYOK LLM-provider key on /models, then drop the snippet below into Pipecat.

Three steps to memory in your agent

  1. Sign up. Free, no card. You'll land on a Getting Started page that walks the next two steps.
  2. Add your LLM key. Engram is BYOK. Paste an OpenAI / Anthropic / Groq / Together / Fireworks key and we'll route every extraction and query call through your provider. You pay your provider directly. We never see your inference.
  3. Paste the snippet below into your agent and restart it. Use Authorization: Bearer <api-key>with the API key from your portal.

pipecat-engram: voice + realtime memory FrameProcessor

EngramMemoryProcessor sits between the user input and the LLM context aggregator in your Pipecat pipeline. Stores every finalized user turn; injects recalled memory as a silent skip_tts=True TextFrame right before each LLM call. Source: github.com/lumetra-io/engram-pipecat.

  1. Install (PyPI release pending; install from source for now):
  2. Terminal
    pip install git+https://github.com/lumetra-io/engram-pipecat
  3. Export your API key:
  4. Terminal
    export ENGRAM_API_KEY="<api-key>"
  5. Slot EngramMemoryProcessor into your pipeline (between user input and the LLM context aggregator):
  6. Python
    from pipecat.pipeline.pipeline import Pipeline
    from pipecat.pipeline.task import PipelineTask
    from pipecat.pipeline.runner import PipelineRunner
    from pipecat_engram import EngramMemoryProcessor
    
    memory = EngramMemoryProcessor(
        bucket="my_agent",
        recall_prefix="Relevant memory: ",
    )
    
    pipeline = Pipeline([
        transport.input(),               # mic / WebRTC / etc.
        stt,                             # e.g. DeepgramSTTService
        memory,                          # <-- EngramMemoryProcessor
        context_aggregator.user(),
        llm,                             # e.g. OpenAILLMService
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])
    
    await PipelineRunner().run(PipelineTask(pipeline))

What you can do once memory's wired in

  • Voice receptionist that remembers caller preferences across separate calls
  • Realtime customer-support agent with memory of prior conversations and decisions
  • Mix with Pipecat's existing transports (WebRTC, Daily, LiveKit); memory is transport-agnostic
  • Per-caller buckets via `EngramMemoryProcessor(bucket=f'caller-{caller_id}')`

FAQ

Where does the processor sit in the pipeline?

Between the user input (STT or text) and `context_aggregator.user()`. That way the LLM context aggregator sees the recall `TextFrame` just before each user turn, but it doesn't go through TTS (because `skip_tts=True`).

Does it work in text-only pipelines?

Yes. `EngramMemoryProcessor` also handles plain `TextFrame`s flowing downstream from the user side, so STT and TTS are optional.

What does `recall_prefix` do?

It's the literal prefix prepended to the recalled memory string before injection (default `'Relevant memory: '`). Customize it to match how your prompt expects context to be framed.

Ship durable memory in Pipecat today

Free tier: 10K memories and 50K retrievals per month. No credit card. Same Engram backend powers all 41 integrations, so memories you write from one client are immediately queryable from the rest.