Skip to content

Conceptual Guide

The ARMS modules form the backbone of the system, handling all telemetry, logging, and data management operations. The key features include token usage and cost tracking per project, MongoDB/DynamoDB/ClickHouse backend for centralized data storage, JSON export capabilities for visualization, and custom metrics logging capabilities.

Modules and Features

elsaiARMS

Acts as the orchestrator and public interface for the entire ARMS framework. This class initializes all sub-modules and provides a high-level API to start monitoring, record token usage, and export metrics. The elsaiARMS class manages:

Project lifecycle (Project Manager)
Real-time monitoring (ARMS Wrapper)
Token and cost tracking
LLM metrics tracking
OCR metrics tracking
RAG metrics tracking
Embedding metrics tracking
Agent metrics tracking
Custom logs handling
Managing runs (Run Data)
Exporting to structured JSON

Key Features

  • Automatically handles telemetry, token logging, and cost tracking
  • Simplifies data export and session lifecycle management
  • Comprehensive monitoring across all AI/ML operations

Project Manager

Maintains metadata and persistent state related to each LLM project, such as project name, creation time, and session records. The project manager gets the log for each run from run data in elsai ARMS and stores the details in a database. Handles:

Project creation and loading
Data dictionary updates
Serialization-ready format for export

Key Features

  • Create Project: Creates a new project
  • Load Project: Load existing project
  • Save Project: Save project details in MongoDB

ARMS Wrapper

ARMS Wrapper is responsible for real-time observability of LLM-based operations. It acts as the system's telemetry layer, tracking and recording essential performance metrics that can later be analyzed to evaluate model responsiveness, usage patterns, and service reliability. The ARMS wrapper is connected to Token Metrics, Cost Metrics, LLM Monitor, OCR Monitor, RAG Monitor, Embedding Monitor, and Agent Monitor to record the metrics.

Key Features

  • Histogram-based latency tracking
  • Counter for metrics
  • Integrates easily with external observability platforms
  • Unified monitoring across all AI/ML operations

Token Metrics

Tracks input/output tokens used across all LLM interactions within a project session. Provides an overview of token usage for each project. Gets token details from LLM response metadata from LLM monitoring function.

Key Features

  • Computes total token usage
  • Useful for token management and budget planning

Cost Metrics

Estimates the monetary cost of LLM usage based on the recorded token consumption and configured price rates. Provides an overview of the cost for projects. Calculates cost based on the token usage, which is accessed from LLM response metadata from LLM monitoring function.

Key Features

  • Tracks real-time cost as tokens are recorded
  • Support model-specific pricing schemes

LLM Monitoring

Captures performance and usage analytics of LLM interactions, providing comprehensive insights into language model operations and performance. LLM monitoring tracks the following key metrics:

Model Information

Model name, version, provider, and configuration details

Token Metrics

Input tokens, output tokens, total tokens, and token efficiency

Performance Metrics

Latency, throughput, response time, and processing speed

Quality Metrics

Response relevance, completion rates, and error handling

Cost Tracking

Per-request cost based on token consumption and model pricing

Operational Metrics

Request success rates, failure patterns, and system status

Content Analysis

Prompt complexity, response length, and content quality

Throughput Metrics

Tokens per second, output throughput, and total throughput

Governance Metrics

Content safety, prompt injection detection, and response quality assessment

Relevance Scoring

Content relevance and accuracy metrics

Key Features

  • Real-time LLM performance monitoring and alerting
  • Model-specific performance comparison and benchmarking
  • Token usage optimization and cost analysis
  • Quality assessment and response validation
  • Performance degradation detection and alerting
  • Comprehensive cost tracking across different models and providers
  • Error analysis and failure pattern detection
  • Request/response content analysis and quality scoring
  • Throughput performance analysis and optimization
  • Governance and compliance monitoring
  • Content safety and quality assurance

OCR Monitoring

Captures performance and quality metrics of Optical Character Recognition operations, providing comprehensive insights into text extraction processes. OCR monitoring tracks the following key metrics:

Model Information

OCR engine used (EasyOCR, Textract, Azure Document Intelligence, Azure Computer Vision, Google Vision AI, PaddleOCR, Tesseract, Vision LLM)

Text Processing Metrics

Extracted text length, confidence scores, processing duration

Performance Metrics

Latency, throughput, error rates

Cost Tracking

Per-operation cost based on OCR service pricing

Quality Assessment

Confidence scores and accuracy metrics

Key Features

  • Real-time OCR performance monitoring
  • Model-specific performance comparison
  • Quality metrics for text extraction accuracy
  • Cost optimization insights across different OCR providers
  • Error tracking and failure analysis

RAG Monitoring

Captures comprehensive metrics for Retrieval-Augmented Generation operations, monitoring the entire pipeline from document retrieval to response generation. RAG monitoring tracks:

Retrieval Metrics

Query processing, document count, relevance scoring

Performance Metrics

Latency, throughput, success rates

Quality Metrics

Relevance scores, result accuracy, user satisfaction

Operational Metrics

Function calls, error handling, system status

Cost Analysis

Per-query cost and resource utilization

Key Features

  • End-to-end RAG pipeline monitoring
  • Relevance scoring and quality assessment
  • Performance optimization insights
  • Cost tracking across different RAG operations
  • Error analysis and failure pattern detection

Embedding Monitoring

Tracks performance and quality metrics for vector embedding generation operations, essential for semantic search and AI applications. Embedding monitoring captures:

Input Metrics

Text length, complexity, preprocessing time

Output Metrics

Vector dimensions, quality scores, generation time

Performance Metrics

Latency, throughput, resource utilization

Quality Assessment

Embedding quality scores and consistency

Cost Tracking

Per-embedding operation costs

Key Features

  • Real-time embedding performance monitoring
  • Quality assessment and consistency tracking
  • Performance optimization recommendations
  • Cost analysis across different embedding models
  • Resource utilization monitoring

Agent Monitoring

Distributed tracing for AI agent workflows. On-prem deployments support two integration paths — pick one per application; do not mix them in the same run.

LangChain / LangGraph

Use arms.langchain_callback and pass it in the graph or agent config callbacks.

Integration

`arms.langchain_callback` in `config={"callbacks": [...]}`

Span types

Generation, tool, and chain spans from LangChain runnables

LangGraph

Node and step metadata (`langgraph_node`, `langgraph_step`)

Framework tags

`langchain`, `langgraph`

elsai Agents (on-prem)

Use arms.elsai_agents_hook on the top-level manager, GraphBuilder, or Swarm — not on every leaf agent.

Integration

`arms.elsai_agents_hook` via `AgentConfig`, `GraphBuilder.set_hook_providers`, or `Swarm(hooks=[...])`

Span types

Agent, generation, tool, orchestrator, node, and handoff spans

Orchestrators

elsai Graph and Swarm with node chains and Swarm handoff spans

Nested as_tool

Reject, queue, and spawn modes with child agent tracing

Framework tags (span metadata)

`elsai_agents` on most spans; `elsai_graph` or `elsai_swarm` on orchestrator root span only

Shared run metrics

Both paths write distributed trace trees and populate the same run-level analytics:

Key Features

  • Parent-child span hierarchy with token and cost per generation span
  • trace_completeness_score and overall_tool_success_rate
  • nodes_executed (from LangGraph node metadata or elsai Graph/Swarm node spans)
  • Token and cost rollup at end_run()
  • Dashboard drill-down via agent_metrics trace IDs
  • elsai Graph/Swarm: orchestrator root span tagged elsai_graph or elsai_swarm; trace document may still show elsai_agents (see elsai Agents monitoring)

Exporter

Formats all accumulated project data into a structured exportable format, JSON.

Key Features

  • Modular formatting logic
  • Merges multiple sources (e.g., tokens + cost + latency + OCR + RAG + Embedding + Agent metrics)
  • Structured output with computed summaries
  • Comprehensive coverage of all monitoring domains

Functional Flow

System Architecture Flow

Understanding how ARMS components interact and process data

Monitoring Integration

Unified Monitoring Framework

The ARMS framework provides unified monitoring across all AI/ML operations, ensuring comprehensive observability

Unified Metrics Collection

All monitoring modules feed into a centralized metrics system

Cross-Component Analysis

Correlation between different operation types for holistic insights

Performance Optimization

Data-driven recommendations for improving system efficiency

Cost Management

Comprehensive cost tracking across all AI/ML operations

Quality Assurance

Quality metrics and performance benchmarks for all operations

Copyright © 2026 elsai foundry.