Skip to content

Guardrails Configuration

Configure safety checks and validation rules for your application.

Reference Configuration

The following guardrail policy matches the project config.yml and serves as the canonical reference for all available options:

yaml
# Guardrail policy configuration

guardrails:
  input_checks: true
  output_checks: true

  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  toxicity_threshold: 0.7
  block_toxic: true
  block_sensitive_data: true

  # PII/PHI detection policy
  pii:
    enabled: true
    input_checks: true
    output_checks: true
    language: en
    default_confidence_threshold: 0.5
    below_threshold_action: flag
    default_action: flag
    default_mask: true
    enable_phi_detection: true
    entity_types:
      - PERSON
      - LOCATION
      - EMAIL_ADDRESS
      - PHONE_NUMBER
      - CREDIT_CARD
      - NRP
      - MEDICAL_LICENSE
      - US_SSN
      - IBAN_CODE
      - IP_ADDRESS
    entity_thresholds:
      PERSON: 0.7
    entity_policies:
      CREDIT_CARD:
        action: block
        mask: true
      US_SSN:
        action: block
        mask: true
      EMAIL_ADDRESS:
        action: flag
        mask: true
      PHONE_NUMBER:
        action: flag
        mask: true
      PHI_MRN:
        action: review
        mask: true
      PHI_PATIENT_ID:
        action: review
        mask: true

  # Token budget enforcement policy
  token_budget:
    enabled: true
    input_checks: true
    output_checks: true
    max_request_tokens: 50
    max_run_tokens: 80
    reserved_output_tokens: 10
    block_on_exceeded: true

  # Tool authorization policy
  tool_authorization:
    enabled: true
    denied_tools:
      - execute_shell
    sensitive_tools:
      - delete_record
    roles:
      analyst:
        allowed_tools:
          - search_web
          - calculator
      engineer:
        allowed_tools:
          - search_web
          - calculator
          - delete_record

  # Rate limiting policy
  rate_limit:
    enabled: true
    max_requests_per_session: 5
    max_tool_calls_per_session: 50
    max_tool_execution_seconds: 60

  # Data exfiltration detection (output only)
  data_exfiltration:
    enabled: true
    output_checks: true
    action_thresholds:
      warn: 20
      block: 80
    detectors:
      secrets: true
      bulk_sensitive: true
      abnormal_patterns: true

  # ARMS Backend storage (MongoDB / DynamoDB / ClickHouse via Backend)
  storage:
    enabled: true
    project: my-app
    store_raw_text: true
    fail_soft: true
    arms_correlation: true

Configuration Options

Basic Settings

yaml
guardrails:
  input_checks: true    # Enable input validation
  output_checks: true   # Enable output validation

Check Types

yaml
guardrails:
  check_toxicity: true        # Enable toxicity detection
  check_sensitive_data: true  # Enable sensitive data detection
  check_semantic: true        # Enable content classification
  check_off_topic: false      # Enable off-topic detection
  check_sql_syntax: false     # Enable SQL syntax validation

Toxicity Settings

yaml
guardrails:
  check_toxicity: true
  toxicity_threshold: 0.7  # Threshold for blocking (0.0-1.0)
  block_toxic: true        # Block toxic content when detected

Toxicity Threshold: Content with toxicity confidence above this threshold will be blocked if block_toxic is enabled.

Sensitive Data Settings

yaml
guardrails:
  check_sensitive_data: true
  block_sensitive_data: true  # Block content containing sensitive data

Detected sensitive data types include:

  • Email addresses
  • Phone numbers
  • Credit card numbers
  • Social security numbers
  • IP addresses
  • And more...

Content Classification

yaml
guardrails:
  check_semantic: true  # Enable content classification

Content classification detects:

  • Jailbreak attempts: Attempts to bypass safety restrictions
  • Malicious content: Requests for harmful activities
  • Prompt injection: Attempts to inject malicious instructions
  • Malicious code injection: Code injection attempts

Off-Topic Detection

yaml
guardrails:
  check_off_topic: true
  block_off_topic: true
  allowed_topics:
    - name: "Product Information"
      description: "Questions about product features, specifications, and pricing"
    - name: "Technical Support"
      description: "Help with installation, troubleshooting, and technical issues"

Off-topic detection helps keep conversations focused on allowed subjects. See Off-Topic Detection for details.

SQL Syntax Validation

yaml
guardrails:
  check_sql_syntax: true
  sql_dialect: "mysql"  # postgresql, mysql, sqlserver, sqlite, mongodb, oracle, redshift

SQL syntax validation checks SQL queries for syntax errors. Supported dialects:

  • postgresql - PostgreSQL
  • mysql - MySQL/MariaDB
  • sqlserver - Microsoft SQL Server
  • sqlite - SQLite
  • mongodb - MongoDB
  • oracle - Oracle Database
  • redshift - Amazon Redshift

See SQL Syntax Validation for details.

PII/PHI Detection and Data Masking

Requires the spaCy model after package installation:

bash
python -m spacy download en_core_web_lg
yaml
guardrails:
  pii:
    enabled: true
    input_checks: true
    output_checks: true
    language: en
    default_confidence_threshold: 0.5
    below_threshold_action: flag
    default_action: flag
    default_mask: true
    enable_phi_detection: true
    entity_types:
      - PERSON
      - LOCATION
      - EMAIL_ADDRESS
      - PHONE_NUMBER
      - CREDIT_CARD
      - NRP
      - MEDICAL_LICENSE
      - US_SSN
      - IBAN_CODE
      - IP_ADDRESS
    entity_thresholds:
      PERSON: 0.7
    entity_policies:
      CREDIT_CARD:
        action: block
        mask: true
      US_SSN:
        action: block
        mask: true
      EMAIL_ADDRESS:
        action: flag
        mask: true
      PHONE_NUMBER:
        action: flag
        mask: true
      PHI_MRN:
        action: review
        mask: true
      PHI_PATIENT_ID:
        action: review
        mask: true

Supported entity types:

Entity TypeDescription
PERSONPersonal names
LOCATIONGeographic locations
EMAIL_ADDRESSEmail addresses
PHONE_NUMBERPhone numbers
CREDIT_CARDCredit card numbers
NRPNationalities, religious, or political groups
MEDICAL_LICENSEMedical license numbers
US_SSNU.S. Social Security numbers
IBAN_CODEInternational bank account numbers
IP_ADDRESSIP addresses
PHI_MRNMedical record numbers (regex-based PHI detection)
PHI_PATIENT_IDPatient identifiers (regex-based PHI detection)

PII/PHI detection identifies sensitive entities using Microsoft Presidio Analyzer, applies configurable policy actions (flag, block, review, pass), supports data masking, and logs detection events. See PII/PHI Detection for details.

Token Budget Enforcement

yaml
guardrails:
  token_budget:
    enabled: true
    input_checks: true
    output_checks: true
    max_request_tokens: 50
    max_run_tokens: 80
    reserved_output_tokens: 10
    block_on_exceeded: true   # true = block; false = warn only

Token budget enforcement computes token usage across the full request context and rejects or warns on oversized requests. See Token Budget Enforcement for details.

Tool Authorization

yaml
guardrails:
  tool_authorization:
    enabled: true
    denied_tools:
      - execute_shell
    sensitive_tools:
      - delete_record
    roles:
      analyst:
        allowed_tools:
          - search_web
          - calculator
      engineer:
        allowed_tools:
          - search_web
          - calculator
          - delete_record

Tool authorization restricts agent tool access through role-based allowlists and global denylists. Enforced via before_tool_call() hooks in agent frameworks. See Tool Authorization for details.

Rate Limiting

yaml
guardrails:
  rate_limit:
    enabled: true
    max_requests_per_session: 5
    max_tool_calls_per_session: 50
    max_tool_execution_seconds: 60

Rate limiting protects against excessive requests, tool call loops, and runaway execution time. Enforced via session hooks in agent frameworks. See Rate Limiting for details.

Data Exfiltration Detection

Output-only guardrail that scores LLM responses for credential leaks, bulk sensitive data, and export-style payloads.

yaml
guardrails:
  output_checks: true

  data_exfiltration:
    enabled: true
    output_checks: true
    action_thresholds:
      warn: 20
      block: 80
    mask_token: "[REDACTED]"
    detectors:
      secrets: true
      bulk_sensitive: true
      abnormal_patterns: true
    use_detect_secrets_plugin: true
    bulk_sensitive:
      threshold: 20
      score_per_hit: 2
      max_score: 40

Runs on model output only. At the warn threshold, sensitive spans are masked; at the block threshold, the response is rejected. See Data Exfiltration Detection for details.

ARMS Storage

Persist guardrail run data through the ARMS Backend to MongoDB, DynamoDB, or ClickHouse (auto-selected by your deployment).

yaml
guardrails:
  storage:
    enabled: true
    project: my-app
    store_raw_text: true
    fail_soft: true
    unique_run_per_project: false
    arms_correlation: true

Backend credentials are read from API_BASE_URL and ELSAI_ARMS_API_KEY (same as ARMS). Link runs with link_arms() or GUARDRAILS_ARMS_RUN_ID / GUARDRAILS_ARMS_PROJECT_ID. See ARMS Storage for details.

Configuration Reference

OptionTypeDefaultDescription
input_checksbooltrueEnable input validation
output_checksbooltrueEnable output validation
check_toxicitybooltrueEnable toxicity detection
check_sensitive_databooltrueEnable sensitive data detection
check_semanticbooltrueEnable content classification
check_off_topicboolfalseEnable off-topic detection
check_sql_syntaxboolfalseEnable SQL syntax validation
toxicity_thresholdfloat0.7Threshold for blocking toxic content (0.0-1.0)
block_toxicbooltrueBlock toxic content
block_sensitive_databooltrueBlock sensitive data
block_off_topicbooltrueBlock off-topic inputs
allowed_topicslistNoneList of allowed topics (required for off-topic detection)
sql_dialectstr"mysql"SQL dialect for syntax validation
piidictPII/PHI detection and data masking policy (see below)
token_budgetdictToken budget enforcement policy (see below)
tool_authorizationdictTool access control policy (see below)
rate_limitdictRate limiting and abuse prevention policy (see below)
data_exfiltrationdictOutput data exfiltration detection policy (see below)
storagedictARMS Backend persistence policy (see below)

PII/PHI Detection Options

OptionTypeDefaultDescription
pii.enabledboolfalseEnable PII/PHI detection
pii.input_checksbooltrueRun detection on user input
pii.output_checksbooltrueRun detection on model output
pii.languagestr"en"Language code for entity analysis
pii.default_confidence_thresholdfloat0.5Global minimum confidence for entity recognition
pii.below_threshold_actionstr"flag"Action for entities below their threshold
pii.default_actionstr"flag"Default action when no entity policy is defined
pii.default_maskbooltrueMask detected values by default
pii.enable_phi_detectionbooltrueEnable regex-based PHI pattern detection
pii.entity_typeslistEntity types to detect
pii.entity_thresholdsdictPer-entity confidence overrides
pii.entity_policiesdictPer-entity action and masking rules

Entity Policy Options

Each key under entity_policies is an entity type name. Each policy supports:

FieldTypeValuesDescription
actionstrflag, block, review, passPolicy action applied when the entity is detected
maskbooltrue, falseWhether to mask the detected value before downstream processing

Example entity policies from config.yml:

EntityActionMaskBehavior
CREDIT_CARDblocktrueBlock request and mask value
US_SSNblocktrueBlock request and mask value
EMAIL_ADDRESSflagtrueFlag detection and mask value
PHONE_NUMBERflagtrueFlag detection and mask value
PHI_MRNreviewtrueMark for review and mask value
PHI_PATIENT_IDreviewtrueMark for review and mask value

Token Budget Options

OptionTypeDefaultDescription
token_budget.enabledboolfalseEnable token budget enforcement
token_budget.input_checksbooltrueEnforce limits on incoming requests
token_budget.output_checksbooltrueEnforce limits on model output
token_budget.max_request_tokensintMaximum tokens for a single request context
token_budget.max_run_tokensintMaximum total tokens for an entire run
token_budget.reserved_output_tokensintTokens reserved for the model response
token_budget.block_on_exceededbooltrueBlock when budget exceeded; false emits warning only

Tool Authorization Options

OptionTypeDefaultDescription
tool_authorization.enabledboolfalseEnable tool authorization
tool_authorization.denied_toolslistTools blocked for all roles
tool_authorization.sensitive_toolslistTools requiring metadata={"approved": true}
tool_authorization.rolesdictRole definitions with allowed_tools lists

Rate Limiting Options

OptionTypeDefaultDescription
rate_limit.enabledboolfalseEnable rate limiting
rate_limit.max_requests_per_sessionintMaximum LLM requests per session
rate_limit.max_tool_calls_per_sessionintMaximum tool invocations per session
rate_limit.max_tool_execution_secondsintMaximum cumulative tool execution time (seconds)

Data Exfiltration Options

OptionTypeDefaultDescription
data_exfiltration.enabledboolfalseEnable output exfiltration detection
data_exfiltration.output_checksboolinherits output_checksRun on model output
data_exfiltration.action_thresholds.warnint20Minimum score to mask sensitive spans
data_exfiltration.action_thresholds.blockint80Minimum score to block the response
data_exfiltration.mask_tokenstr"[REDACTED]"Replacement for masked spans
data_exfiltration.detectors.secretsbooltrueEnable secret/credential detection
data_exfiltration.detectors.bulk_sensitivebooltrueEnable bulk identifier detection
data_exfiltration.detectors.abnormal_patternsbooltrueEnable export-style pattern detection
data_exfiltration.use_detect_secrets_pluginbooltrueUse optional detect-secrets package
data_exfiltration.bulk_sensitive.thresholdint20Minimum matches to trigger bulk detector

ARMS Storage Options

OptionTypeDefaultDescription
storage.enabledboolfalseEnable persistence via ARMS Backend
storage.projectstr"default"Logical project name on run documents
storage.store_raw_textbooltrueStore full text; false uses SHA-256 digests
storage.fail_softbooltrueLog on write failure instead of raising
storage.unique_run_per_projectboolfalseBackend upsert behavior for run ids
storage.arms_correlationbooltrueAuto-link ARMS run_id / project_id
storage.api_base_urlstrenv API_BASE_URLARMS Backend URL (optional override)
storage.api_keystrenv ELSAI_ARMS_API_KEYBackend API key (optional override)
storage.master_keystrenv ARMS_MASTER_KEYOptional master key header

Use Cases

Strict Mode

Block all potentially problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  toxicity_threshold: 0.5  # Lower threshold = more strict
  block_toxic: true
  block_sensitive_data: true
  pii:
    enabled: true
    default_action: block
    default_mask: true

Permissive Mode

Only block clearly problematic content:

yaml
guardrails:
  input_checks: true
  output_checks: true
  check_toxicity: true
  check_sensitive_data: false  # Allow sensitive data
  check_semantic: true
  toxicity_threshold: 0.9  # Higher threshold = more permissive
  block_toxic: true
  block_sensitive_data: false
  pii:
    enabled: true
    default_action: flag
    default_mask: false

Input-Only Mode

Only validate input, not output:

yaml
guardrails:
  input_checks: true
  output_checks: false
  check_toxicity: true
  check_sensitive_data: true
  check_semantic: true
  pii:
    enabled: true
    input_checks: true
    output_checks: false
  token_budget:
    enabled: true
    input_checks: true
    output_checks: false

Next Steps

Copyright © 2026 elsai foundry.