Conceptual Guide

The ARMS modules form the backbone of the system, handling all telemetry, logging, and data management operations. The key features include token usage and cost tracking per project, MongoDB/DynamoDB/ClickHouse backend for centralized data storage, JSON export capabilities for visualization, and custom metrics logging capabilities.

Modules and Features

elsaiARMS

Acts as the orchestrator and public interface for the entire ARMS framework. This class initializes all sub-modules and provides a high-level API to start monitoring, record token usage, and export metrics. The elsaiARMS class manages:

Project lifecycle (Project Manager)

Real-time monitoring (ARMS Wrapper)

Token and cost tracking

LLM metrics tracking

OCR metrics tracking

RAG metrics tracking

Embedding metrics tracking

Agent metrics tracking

Custom logs handling

Managing runs (Run Data)

Exporting to structured JSON

Key Features

Automatically handles telemetry, token logging, and cost tracking
Simplifies data export and session lifecycle management
Comprehensive monitoring across all AI/ML operations

Project Manager

Maintains metadata and persistent state related to each LLM project, such as project name, creation time, and session records. The project manager gets the log for each run from run data in elsai ARMS and stores the details in a database. Handles:

Project creation and loading

Data dictionary updates

Serialization-ready format for export

Key Features

Create Project: Creates a new project
Load Project: Load existing project
Save Project: Save project details in MongoDB

ARMS Wrapper

ARMS Wrapper is responsible for real-time observability of LLM-based operations. It acts as the system's telemetry layer, tracking and recording essential performance metrics that can later be analyzed to evaluate model responsiveness, usage patterns, and service reliability. The ARMS wrapper is connected to Token Metrics, Cost Metrics, LLM Monitor, OCR Monitor, RAG Monitor, Embedding Monitor, and Agent Monitor to record the metrics.

Key Features

Histogram-based latency tracking
Counter for metrics
Integrates easily with external observability platforms
Unified monitoring across all AI/ML operations

Token Metrics

Tracks input/output tokens used across all LLM interactions within a project session. Provides an overview of token usage for each project. Gets token details from LLM response metadata from LLM monitoring function.

Key Features

Computes total token usage
Useful for token management and budget planning

Cost Metrics

Estimates the monetary cost of LLM usage based on the recorded token consumption and configured price rates. Provides an overview of the cost for projects. Calculates cost based on the token usage, which is accessed from LLM response metadata from LLM monitoring function.

Key Features

Tracks real-time cost as tokens are recorded
Support model-specific pricing schemes

LLM Monitoring

Captures performance and usage analytics of LLM interactions, providing comprehensive insights into language model operations and performance. LLM monitoring tracks the following key metrics:

Model Information

Model name, version, provider, and configuration details

Token Metrics

Input tokens, output tokens, total tokens, and token efficiency

Performance Metrics

Latency, throughput, response time, and processing speed

Quality Metrics

Response relevance, completion rates, and error handling

Cost Tracking

Per-request cost based on token consumption and model pricing

Operational Metrics

Request success rates, failure patterns, and system status

Content Analysis

Prompt complexity, response length, and content quality

Throughput Metrics

Tokens per second, output throughput, and total throughput

Governance Metrics

Content safety, prompt injection detection, and response quality assessment

Relevance Scoring

Content relevance and accuracy metrics

Key Features

Real-time LLM performance monitoring and alerting
Model-specific performance comparison and benchmarking
Token usage optimization and cost analysis
Quality assessment and response validation
Performance degradation detection and alerting
Comprehensive cost tracking across different models and providers
Error analysis and failure pattern detection
Request/response content analysis and quality scoring
Throughput performance analysis and optimization
Governance and compliance monitoring
Content safety and quality assurance

OCR Monitoring

Captures performance and quality metrics of Optical Character Recognition operations, providing comprehensive insights into text extraction processes. OCR monitoring tracks the following key metrics:

Model Information

OCR engine used (EasyOCR, Textract, Azure Document Intelligence, Azure Computer Vision, Google Vision AI, PaddleOCR, Tesseract, Vision LLM)

Text Processing Metrics

Extracted text length, confidence scores, processing duration

Performance Metrics

Latency, throughput, error rates

Cost Tracking

Per-operation cost based on OCR service pricing

Quality Assessment

Confidence scores and accuracy metrics

Key Features

Real-time OCR performance monitoring
Model-specific performance comparison
Quality metrics for text extraction accuracy
Cost optimization insights across different OCR providers
Error tracking and failure analysis

RAG Monitoring

Captures comprehensive metrics for Retrieval-Augmented Generation operations, monitoring the entire pipeline from document retrieval to response generation. RAG monitoring tracks:

Retrieval Metrics

Query processing, document count, relevance scoring

Performance Metrics

Latency, throughput, success rates

Quality Metrics

Relevance scores, result accuracy, user satisfaction

Operational Metrics

Function calls, error handling, system status

Cost Analysis

Per-query cost and resource utilization

Key Features

End-to-end RAG pipeline monitoring
Relevance scoring and quality assessment
Performance optimization insights
Cost tracking across different RAG operations
Error analysis and failure pattern detection

Embedding Monitoring

Tracks performance and quality metrics for vector embedding generation operations, essential for semantic search and AI applications. Embedding monitoring captures:

Input Metrics

Text length, complexity, preprocessing time

Output Metrics

Vector dimensions, quality scores, generation time

Performance Metrics

Latency, throughput, resource utilization

Quality Assessment

Embedding quality scores and consistency

Cost Tracking

Per-embedding operation costs

Key Features

Real-time embedding performance monitoring
Quality assessment and consistency tracking
Performance optimization recommendations
Cost analysis across different embedding models
Resource utilization monitoring

Agent Monitoring

Distributed tracing for AI agent workflows. On-prem deployments support two integration paths — pick one per application; do not mix them in the same run.

LangChain / LangGraph

Use arms.langchain_callback and pass it in the graph or agent config callbacks.

Integration

`arms.langchain_callback` in `config={"callbacks": [...]}`

Span types

Generation, tool, and chain spans from LangChain runnables

LangGraph

Node and step metadata (`langgraph_node`, `langgraph_step`)

Framework tags

`langchain`, `langgraph`

elsai Agents (on-prem)

Use arms.elsai_agents_hook on the top-level manager, GraphBuilder, or Swarm — not on every leaf agent.

Integration

`arms.elsai_agents_hook` via `AgentConfig`, `GraphBuilder.set_hook_providers`, or `Swarm(hooks=[...])`

Span types

Agent, generation, tool, orchestrator, node, and handoff spans

Orchestrators

elsai Graph and Swarm with node chains and Swarm handoff spans

Nested as_tool

Reject, queue, and spawn modes with child agent tracing

Framework tags (span metadata)

`elsai_agents` on most spans; `elsai_graph` or `elsai_swarm` on orchestrator root span only

Shared run metrics

Both paths write distributed trace trees and populate the same run-level analytics:

Key Features

Parent-child span hierarchy with token and cost per generation span
trace_completeness_score and overall_tool_success_rate
nodes_executed (from LangGraph node metadata or elsai Graph/Swarm node spans)
Token and cost rollup at end_run()
Dashboard drill-down via agent_metrics trace IDs
elsai Graph/Swarm: orchestrator root span tagged elsai_graph or elsai_swarm; trace document may still show elsai_agents (see elsai Agents monitoring)

Exporter

Formats all accumulated project data into a structured exportable format, JSON.

Key Features

Modular formatting logic
Merges multiple sources (e.g., tokens + cost + latency + OCR + RAG + Embedding + Agent metrics)
Structured output with computed summaries
Comprehensive coverage of all monitoring domains

Functional Flow

System Architecture Flow

Understanding how ARMS components interact and process data

Monitoring Integration

Unified Monitoring Framework

The ARMS framework provides unified monitoring across all AI/ML operations, ensuring comprehensive observability

Unified Metrics Collection

All monitoring modules feed into a centralized metrics system

Cross-Component Analysis

Correlation between different operation types for holistic insights

Performance Optimization

Data-driven recommendations for improving system efficiency

Cost Management

Comprehensive cost tracking across all AI/ML operations

Quality Assurance

Quality metrics and performance benchmarks for all operations

Conceptual Guide ​

Modules and Features ​

elsaiARMS

Key Features

Project Manager

Key Features

ARMS Wrapper

Key Features

Token Metrics

Key Features

Cost Metrics

Key Features

LLM Monitoring

Model Information

Token Metrics

Performance Metrics

Quality Metrics

Cost Tracking

Operational Metrics

Content Analysis

Throughput Metrics

Governance Metrics

Relevance Scoring

Key Features

OCR Monitoring

Model Information

Text Processing Metrics

Performance Metrics

Cost Tracking

Quality Assessment

Key Features

RAG Monitoring

Retrieval Metrics

Performance Metrics

Quality Metrics

Operational Metrics

Cost Analysis

Key Features

Embedding Monitoring

Input Metrics

Output Metrics

Performance Metrics

Quality Assessment

Cost Tracking

Key Features

Agent Monitoring

LangChain / LangGraph ​

Integration

Span types

LangGraph

Framework tags

elsai Agents (on-prem) ​

Integration

Span types

Orchestrators

Nested as_tool

Framework tags (span metadata)

Shared run metrics ​

Key Features

Exporter

Key Features

Functional Flow ​

System Architecture Flow ​

Monitoring Integration ​

Unified Monitoring Framework ​

Unified Metrics Collection

Cross-Component Analysis

Performance Optimization

Cost Management

Quality Assurance

Conceptual Guide

Modules and Features

LangChain / LangGraph

elsai Agents (on-prem)

Shared run metrics

Functional Flow

System Architecture Flow

Monitoring Integration

Unified Monitoring Framework