Appearance
Rate Limiting & Abuse Prevention
Protect systems from excessive requests, infinite loops, and denial-of-wallet attacks by restricting requests, tool calls, and execution time per session.
Overview
Rate limiting guardrails enforce quotas on agent activity within a session. Policies are defined in YAML and applied through hooks that run before LLM requests and before tool execution. This prevents runaway agent loops, API throttling violations, and uncontrolled token spend.
Use this guardrail alongside Tool Authorization for comprehensive agent safety in production deployments.
How It Works
- Create a session with
create_session()to track per-session counters. - Call
before_request()before each LLM invocation to enforce request limits. - Call
check_tool_call_limit()in a pre-tool hook to peek at tool call quotas. - Call
record_tool_call()inside each tool when it actually executes. - Wrap tool execution with
start_execution_timer()andend_execution_timer()to enforce time limits.
Configuration
Enable Rate Limiting
yaml
guardrails:
rate_limit:
enabled: true
max_requests_per_session: 5
max_tool_calls_per_session: 50
max_tool_execution_seconds: 60Parameters
| Option | Type | Description |
|---|---|---|
enabled | bool | Enable rate limiting |
max_requests_per_session | int | Maximum LLM requests allowed per session |
max_tool_calls_per_session | int | Maximum tool invocations allowed per session |
max_tool_execution_seconds | int | Maximum cumulative tool execution time per session (seconds) |
Combined with Tool Authorization
Rate limiting and tool authorization can share the same policy file:
yaml
guardrails:
rate_limit:
enabled: true
max_requests_per_session: 5
max_tool_calls_per_session: 50
max_tool_execution_seconds: 60
tool_authorization:
enabled: true
denied_tools:
- execute_shell
sensitive_tools:
- delete_record
roles:
analyst:
allowed_tools:
- search_web
- calculatorUsage with Agent Hooks
Rate limiting is enforced through GuardrailSystem session hooks. Like tool authorization, it requires integration into your agent graph.
Initialize Guardrails
python
from elsai_guardrails.guardrails import GuardrailPolicy, GuardrailSystem
guardrails = GuardrailSystem(
guardrail_policy=GuardrailPolicy.from_file("config.yaml"),
)
rate_limit_config = guardrails.guardrail_policy.to_rate_limit_config()Session Management
python
session = guardrails.create_session()
session_id = session.session_id
# After processing, inspect session metrics
session = guardrails.get_session(session_id)
print(f"requests={session.request_count} tool_calls={session.tool_call_count}")Before LLM Request
python
result = guardrails.before_request(session_id, raise_on_block=False)
if not result.passed:
print(f"Request blocked: {result.error}")
print(f"Count: {result.current_count}/{result.limit}")
else:
# Proceed with LLM call
response = llm.invoke(messages)Before Tool Execution (Peek)
Check limits without incrementing — the actual tool records the call when it runs:
python
result = guardrails.check_tool_call_limit(session_id, raise_on_block=False)
if not result.passed:
print(f"Tool blocked: {result.error}")
else:
would_be = result.current_count + 1
print(f"Tool allowed ({would_be}/{rate_limit_config.max_tool_calls_per_session})")Inside Tool Implementation
Record the call and track execution time when the tool actually runs:
python
@tool
def search_web(query: str, session_id: str) -> str:
guardrails.record_tool_call(session_id)
t = guardrails.start_execution_timer()
result = f"Results for: {query}"
guardrails.end_execution_timer(t)
return resultLangGraph Integration Pattern
Recommended graph flow:
agent → rate_limit → tools → agentagentnode — Callbefore_request()before invoking the LLM.rate_limitnode — Callcheck_tool_call_limit()for each pending tool call beforeToolNoderuns.toolsnode — Each tool callsrecord_tool_call()and wraps execution in a timer.
When a limit is exceeded, inject a ToolMessage or AIMessage with RATE LIMIT BLOCKED: and route back to the agent.
Example Scenarios
| Scenario | Limit | Result |
|---|---|---|
| 5th request in session | max_requests_per_session: 5 | ✅ Allowed |
| 6th request in session | max_requests_per_session: 5 | ❌ Blocked |
| Tool call within quota | max_tool_calls_per_session: 50 | ✅ Allowed |
| Tool call exceeds quota | max_tool_calls_per_session: 50 | ❌ Blocked |
| Slow tool exceeds time budget | max_tool_execution_seconds: 60 | ❌ Blocked on timer |
Best Practices
- Create one session per conversation — Use
create_session()at the start of each user session and passsession_idthrough agent state. - Peek before execute — Use
check_tool_call_limit()in the pre-tool hook andrecord_tool_call()inside the tool to avoid counting blocked calls. - Wrap tools with timers — Always pair
start_execution_timer()andend_execution_timer()around tool logic for execution time limits. - Combine with tool authorization — Apply both rate limits and permission checks for defense in depth.
- Monitor session metrics — Use
get_session()to track request and tool call counts for observability and tuning.
Next Steps
- Tool Authorization — Control which tools each role can access
- Token Budget Enforcement — Limit token usage per request and run
- GuardrailSystem — Core API reference
- Guardrails Configuration — Full configuration reference