Integration
Long-term memory for Pipecat
Pipecat is the open-source framework for voice and realtime AI pipelines (STT → LLM → TTS with frame-level orchestration), and every call starts blank. `EngramMemoryProcessor` is a `FrameProcessor` that sits between the user input and the LLM context aggregator: it stores every finalized user turn, and it injects relevant prior memories as a silent `skip_tts=True` `TextFrame` right before each LLM call.
Install
Three steps: sign up for an Engram API key, paste a BYOK LLM-provider key on /models, then drop the snippet below into Pipecat.
Three steps to memory in your agent
- Sign up. Free, no card. You'll land on a Getting Started page that walks the next two steps.
- Add your LLM key. Engram is BYOK. Paste an OpenAI / Anthropic / Groq / Together / Fireworks key and we'll route every extraction and query call through your provider. You pay your provider directly. We never see your inference.
- Paste the snippet below into your agent and restart it. Use
Authorization: Bearer <api-key>with the API key from your portal.
pipecat-engram: voice + realtime memory FrameProcessor
EngramMemoryProcessor sits between the user input and the LLM context aggregator in your Pipecat pipeline. Stores every finalized user turn; injects recalled memory as a silent skip_tts=True TextFrame right before each LLM call. Source: github.com/lumetra-io/engram-pipecat.
- Install (PyPI release pending; install from source for now):
- Export your API key:
- Slot
EngramMemoryProcessorinto your pipeline (between user input and the LLM context aggregator):
pip install git+https://github.com/lumetra-io/engram-pipecatexport ENGRAM_API_KEY="<api-key>"from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat_engram import EngramMemoryProcessor
memory = EngramMemoryProcessor(
bucket="my_agent",
recall_prefix="Relevant memory: ",
)
pipeline = Pipeline([
transport.input(), # mic / WebRTC / etc.
stt, # e.g. DeepgramSTTService
memory, # <-- EngramMemoryProcessor
context_aggregator.user(),
llm, # e.g. OpenAILLMService
tts,
transport.output(),
context_aggregator.assistant(),
])
await PipelineRunner().run(PipelineTask(pipeline))What you can do once memory's wired in
- Voice receptionist that remembers caller preferences across separate calls
- Realtime customer-support agent with memory of prior conversations and decisions
- Mix with Pipecat's existing transports (WebRTC, Daily, LiveKit); memory is transport-agnostic
- Per-caller buckets via `EngramMemoryProcessor(bucket=f'caller-{caller_id}')`
FAQ
Where does the processor sit in the pipeline?
Between the user input (STT or text) and `context_aggregator.user()`. That way the LLM context aggregator sees the recall `TextFrame` just before each user turn, but it doesn't go through TTS (because `skip_tts=True`).
Does it work in text-only pipelines?
Yes. `EngramMemoryProcessor` also handles plain `TextFrame`s flowing downstream from the user side, so STT and TTS are optional.
What does `recall_prefix` do?
It's the literal prefix prepended to the recalled memory string before injection (default `'Relevant memory: '`). Customize it to match how your prompt expects context to be framed.
Related integrations
Ship durable memory in Pipecat today
Free tier: 10K memories and 50K retrievals per month. No credit card. Same Engram backend powers all 41 integrations, so memories you write from one client are immediately queryable from the rest.