Skip to content

Agent Memory & History Pipelines

Elsai Agents supports a highly flexible, modular memory system designed to manage agent state, conversation history length, and semantic contexts. Instead of relying on a static, single conversation buffer, Elsai enables you to build custom memory pipelines that shape history dynamically.

With the standalone elsai-model, elsai-embeddings, and elsai-vectordb packages, your agents can easily use external embedding models and vector databases to retrieve relevant historic context and persist memory over time.


Memory Pipeline Architecture

Elsai divides memory management into two types of steps within a pipeline:

  1. Elsai-Native Conversation Managers: Simple conversation sizing filters designed to run in memory or persist to a local folder.
    • Persistence: Requires persistence="elsai_file".
    • Classes: ElsaiSlidingStep, ElsaiSummarizingStep.
  2. ElsAI Shaping Pipeline Steps: Advanced strategies that trim, summarize, or age-out messages based on token limits or similarity scoring.
    • Persistence: Requires persistence="elsai_json".
    • Classes: ElsaiTrimmingStep, ElsaiSummarizationStep, ElsaiLRUStep, ElsaiTTLStep, ElsaiSimilarityStep.

IMPORTANT

To configure an agent with a custom memory pipeline, use the build_agent_with_memory builder function and configure MemoryConfig.


Pipeline Steps Reference

Below are the supported pipeline strategy steps that you can register in MemoryConfig.pipeline:

ElsaiSlidingStep

Maintains a simple sliding window of the most recent messages.

  • Parameters:
    • window_size (int): Number of recent messages to preserve. Default: 40.
    • should_truncate_results (bool): Truncate oldest messages when exceeding size. Default: True.

ElsaiSummarizingStep

Compacts old messages by summarizing them using a language model once the history grows.

  • Parameters:
    • preserve_recent_messages (int): Number of recent messages to leave untouched. Default: 10.
    • summary_ratio (float): Target ratio of summarization. Default: 0.3.

ElsaiTrimmingStep

Trims older messages once a limit on message count or token count is exceeded.

  • Parameters:
    • max_messages (int): Maximum messages to allow. Default: 30.
    • max_tokens (int | None): Optional token count threshold. Default: None.
    • preserve_system (bool): Always keep the initial system prompt. Default: True.
    • preserve_recent (int): Number of most recent messages to protect from trimming. Default: 3.

ElsaiSummarizationStep

Converts older messages in the active window into a high-level prose summary.

  • Parameters:
    • trigger_count (int): Trigger summarization when window exceeds this size. Default: 20.
    • preserve_system (bool): Keep system prompt. Default: True.
  • Requires configuring MemoryConfig.summarizer_llm.

ElsaiLRUStep

Performs Least Recently Used (LRU) eviction on conversations.

  • Parameters:
    • max_messages (int): Maximum message window limit. Default: 30.
    • preserve_system (bool): Keep system prompt. Default: True.
    • preserve_recent (int): Protect the last N messages from eviction. Default: 5.

ElsaiTTLStep

Ages out old messages from history based on elapsed time.

  • Parameters:
    • ttl_seconds (int): Time-to-live threshold in seconds. Default: 3600 (1 hour).
    • preserve_system (bool): Keep system prompt. Default: True.
    • preserve_recent (int): Protect recent messages. Default: 5.

Semantic Context Injection Hooks

To provide long-term associative memory, you can attach similarity search and semantic memory hooks directly through MemoryConfig:

1. Similarity Retrieval Config (MemoryConfig.similarity)

Automatically performs vector similarity search on user input against the conversation database and injects matching context.

  • Key parameters:
    • similarity_config (dict): Connection configurations (includes vector DB client and embedding client).
    • top_k (int): Number of matched messages to retrieve. Default: 5.
    • injection_mode (str): "system_append" (appends to system prompt) or "user_preamble" (prepends to user message).

2. Semantic Memory Config (MemoryConfig.semantic)

Maintains abstract facts (like preferences) about a user across multiple sessions and injects them.

  • Key parameters:
    • user_id_key (str): The metadata key matching the user ID. Default: "user_id".
    • injection_mode (str): Where to insert the retrieved facts. Default: "system_append".

Basic Usage Example

Below is a complete example of creating an Elsai agent with a custom memory pipeline, using a Chroma local vector database and Titan embeddings on Bedrock to run similarity searches.

python
import os
from pathlib import Path
from elsai_model.openai import OpenAIConnector
from elsai.integrations.elsai_embeddings import EmbeddingBackendConfig, build_embedding_client
from elsai.integrations.elsai_vectordb import VectorBackendConfig, build_vectordb_client
from elsai.integrations.elsai_memory import (
    MemoryConfig,
    ElsaiTrimmingStep,
    SimilarityRetrievalConfig,
    build_agent_with_memory,
)

# 1. Initialize standalone Embedding & Vector DB clients
embed_client = build_embedding_client(
    EmbeddingBackendConfig(
        provider="bedrock",
        aws_region="us-east-1",
        model_name="amazon.titan-embed-text-v1"
    )
)

vector_db = build_vectordb_client(
    VectorBackendConfig(
        provider="chroma",
        collection_name="agent_history",
        persist_directory="./chroma_db",
    )
)

# 2. Build the similarity config dictionary
similarity_setup = {
    "vector_database": {
        "name": "chroma",
        "client": vector_db,
        "collection_name": "agent_history",
    },
    "embedding_model": {
        "name": "bedrock",
        "client": embed_client,
    },
}

# 3. Define your memory and persistent pipelines
memory_config = MemoryConfig(
    run_id="session_user_123",
    role="customer_support",
    persistence="elsai_json",
    pipeline=[ElsaiTrimmingStep(max_messages=15)],
    similarity=SimilarityRetrievalConfig(
        similarity_config=similarity_setup,
        top_k=3
    )
)

# 4. Spin up the agent using build_agent_with_memory
model = OpenAIConnector(model_name="gpt-4o")

agent = build_agent_with_memory(
    config=memory_config,
    model=model,
)

# 5. Run the agent
result = agent("What did we talk about during our last chat regarding database deployment?")
print(result)

Copyright © 2026 Elsai Foundry.