Agent Loop

The agent loop is the core reasoning cycle that powers every elsai agent. Understanding it helps you reason about latency, token usage, and when hooks fire.

How it works

The loop continues until:

The model returns end_turn (has a final answer)
max_tokens is reached
A per-invocation limit is reached (limit_turns, limit_output_tokens, or limit_total_tokens)
The agent is cancelled

Event sequence

For each agent invocation, these hooks fire in order:

When streaming is enabled, token deltas are emitted between BeforeModelCallEvent and AfterModelCallEvent. Consume them with Agent.stream_async — see Streaming.

Conversation history

The agent accumulates messages in agent.messages. Each round trip looks like:

python

agent.messages = [
    {"role": "user",      "content": [{"text": "What is 2+2?"}]},
    {"role": "assistant", "content": [{"toolUse": {"name": "calculator", ...}}]},
    {"role": "user",      "content": [{"toolResult": {"toolUseId": "...", "content": [{"text": "4"}]}}]},
    {"role": "assistant", "content": [{"text": "2 + 2 = 4."}]},
]

Conversation managers

Manage context window size with a conversation manager:

python

from elsai import Agent
from elsai.agent.conversation_manager import SlidingWindowConversationManager

# Keep the last 20 messages
manager = SlidingWindowConversationManager(window_size=20)
agent = Agent(conversation_manager=manager)

Built-in managers:

Class	Behaviour
`SlidingWindowConversationManager`	Drops oldest messages when limit reached (default)
`SummarizingConversationManager`	Summarises old messages using the model itself
`NullConversationManager`	Disables local history trimming — for non-stateful models; omit `conversation_manager` for stateful models

Context window overflow

When the model returns a context-too-long error, the SDK automatically asks the conversation manager to reduce context and retries:

python

from elsai.agent.conversation_manager import SummarizingConversationManager

agent = Agent(conversation_manager=SummarizingConversationManager())
# Automatically summarises and retries on context overflow

Retry strategy

Transient errors (throttling, network timeouts) are retried automatically with exponential backoff:

python

from elsai import Agent
from elsai.agent import AgentConfig, ModelRetryStrategy

agent = Agent(
    config=AgentConfig(
        retry_strategy=ModelRetryStrategy(
            max_attempts=6,
            initial_delay=4.0,
            max_delay=240.0,
        ),
    ),
)

# Disable retries
agent = Agent(config=AgentConfig(retry_strategy=None))

Tool execution

By default tools are executed concurrently when the model requests multiple at once:

python

from elsai import Agent
from elsai.agent import AgentConfig
from elsai.tools.executors import ConcurrentToolExecutor, SequentialToolExecutor

# Concurrent (default)
agent = Agent(config=AgentConfig(tool_executor=ConcurrentToolExecutor()))

# Sequential (one at a time)
agent = Agent(config=AgentConfig(tool_executor=SequentialToolExecutor()))

Inspecting the loop

Access raw metrics after a call:

python

result = agent("Calculate something complex")
print(result.metrics.accumulated_usage["inputTokens"])
print(result.metrics.accumulated_usage["outputTokens"])
print(result.metrics.accumulated_metrics["latencyMs"])

Invocation limits

Set per-call or default caps on turns and token usage. When a cap is reached, the loop stops with stop_reason of limit_turns, limit_output_tokens, or limit_total_tokens.

python

from elsai import Agent
from elsai.agent import AgentConfig

agent = Agent(config=AgentConfig(limits={"turns": 50}))
result = agent("Long task", limits={"turns": 5})

See Invocation Limits for full semantics and validation rules.

Agent Loop ​

How it works ​

Event sequence ​

Conversation history ​

Conversation managers ​

Context window overflow ​

Retry strategy ​

Tool execution ​

Inspecting the loop ​

Invocation limits ​

Agent Loop

How it works

Event sequence

Conversation history

Conversation managers

Context window overflow

Retry strategy

Tool execution

Inspecting the loop

Invocation limits