Skip to content

Rate Limiting & Abuse Prevention

Protect systems from excessive requests, infinite loops, and denial-of-wallet attacks by restricting requests, tool calls, and execution time per session.

Overview

Rate limiting guardrails enforce quotas on agent activity within a session. Policies are defined in YAML and applied through hooks that run before LLM requests and before tool execution. This prevents runaway agent loops, API throttling violations, and uncontrolled token spend.

Use this guardrail alongside Tool Authorization for comprehensive agent safety in production deployments.

How It Works

  1. Create a session with create_session() to track per-session counters.
  2. Call before_request() before each LLM invocation to enforce request limits.
  3. Call check_tool_call_limit() in a pre-tool hook to peek at tool call quotas.
  4. Call record_tool_call() inside each tool when it actually executes.
  5. Wrap tool execution with start_execution_timer() and end_execution_timer() to enforce time limits.

Configuration

Enable Rate Limiting

yaml
guardrails:
  rate_limit:
    enabled: true
    max_requests_per_session: 5
    max_tool_calls_per_session: 50
    max_tool_execution_seconds: 60

Parameters

OptionTypeDescription
enabledboolEnable rate limiting
max_requests_per_sessionintMaximum LLM requests allowed per session
max_tool_calls_per_sessionintMaximum tool invocations allowed per session
max_tool_execution_secondsintMaximum cumulative tool execution time per session (seconds)

Combined with Tool Authorization

Rate limiting and tool authorization can share the same policy file:

yaml
guardrails:
  rate_limit:
    enabled: true
    max_requests_per_session: 5
    max_tool_calls_per_session: 50
    max_tool_execution_seconds: 60

  tool_authorization:
    enabled: true
    denied_tools:
      - execute_shell
    sensitive_tools:
      - delete_record
    roles:
      analyst:
        allowed_tools:
          - search_web
          - calculator

Usage with Agent Hooks

Rate limiting is enforced through GuardrailSystem session hooks. Like tool authorization, it requires integration into your agent graph.

Initialize Guardrails

python
from elsai_guardrails.guardrails import GuardrailPolicy, GuardrailSystem

guardrails = GuardrailSystem(
    guardrail_policy=GuardrailPolicy.from_file("config.yaml"),
)
rate_limit_config = guardrails.guardrail_policy.to_rate_limit_config()

Session Management

python
session = guardrails.create_session()
session_id = session.session_id

# After processing, inspect session metrics
session = guardrails.get_session(session_id)
print(f"requests={session.request_count}  tool_calls={session.tool_call_count}")

Before LLM Request

python
result = guardrails.before_request(session_id, raise_on_block=False)

if not result.passed:
    print(f"Request blocked: {result.error}")
    print(f"Count: {result.current_count}/{result.limit}")
else:
    # Proceed with LLM call
    response = llm.invoke(messages)

Before Tool Execution (Peek)

Check limits without incrementing — the actual tool records the call when it runs:

python
result = guardrails.check_tool_call_limit(session_id, raise_on_block=False)

if not result.passed:
    print(f"Tool blocked: {result.error}")
else:
    would_be = result.current_count + 1
    print(f"Tool allowed ({would_be}/{rate_limit_config.max_tool_calls_per_session})")

Inside Tool Implementation

Record the call and track execution time when the tool actually runs:

python
@tool
def search_web(query: str, session_id: str) -> str:
    guardrails.record_tool_call(session_id)
    t = guardrails.start_execution_timer()
    result = f"Results for: {query}"
    guardrails.end_execution_timer(t)
    return result

LangGraph Integration Pattern

Recommended graph flow:

agent → rate_limit → tools → agent
  • agent node — Call before_request() before invoking the LLM.
  • rate_limit node — Call check_tool_call_limit() for each pending tool call before ToolNode runs.
  • tools node — Each tool calls record_tool_call() and wraps execution in a timer.

When a limit is exceeded, inject a ToolMessage or AIMessage with RATE LIMIT BLOCKED: and route back to the agent.

Example Scenarios

ScenarioLimitResult
5th request in sessionmax_requests_per_session: 5✅ Allowed
6th request in sessionmax_requests_per_session: 5❌ Blocked
Tool call within quotamax_tool_calls_per_session: 50✅ Allowed
Tool call exceeds quotamax_tool_calls_per_session: 50❌ Blocked
Slow tool exceeds time budgetmax_tool_execution_seconds: 60❌ Blocked on timer

Best Practices

  1. Create one session per conversation — Use create_session() at the start of each user session and pass session_id through agent state.
  2. Peek before execute — Use check_tool_call_limit() in the pre-tool hook and record_tool_call() inside the tool to avoid counting blocked calls.
  3. Wrap tools with timers — Always pair start_execution_timer() and end_execution_timer() around tool logic for execution time limits.
  4. Combine with tool authorization — Apply both rate limits and permission checks for defense in depth.
  5. Monitor session metrics — Use get_session() to track request and tool call counts for observability and tuning.

Next Steps

Copyright © 2026 elsai foundry.