Appearance
Conceptual Guide
The ARMS modules form the backbone of the system, handling all telemetry, logging, and data management operations. The key features include token usage and cost tracking per project, MongoDB/DynamoDB/ClickHouse backend for centralized data storage, JSON export capabilities for visualization, and custom metrics logging capabilities.
Modules and Features
elsaiARMS
Acts as the orchestrator and public interface for the entire ARMS framework. This class initializes all sub-modules and provides a high-level API to start monitoring, record token usage, and export metrics. The elsaiARMS class manages:
Project lifecycle (Project Manager)
Real-time monitoring (ARMS Wrapper)
Token and cost tracking
LLM metrics tracking
OCR metrics tracking
RAG metrics tracking
Embedding metrics tracking
Agent metrics tracking
Custom logs handling
Managing runs (Run Data)
Exporting to structured JSON
Key Features
- Automatically handles telemetry, token logging, and cost tracking
- Simplifies data export and session lifecycle management
- Comprehensive monitoring across all AI/ML operations
Project Manager
Maintains metadata and persistent state related to each LLM project, such as project name, creation time, and session records. The project manager gets the log for each run from run data in elsai ARMS and stores the details in a database. Handles:
Project creation and loading
Data dictionary updates
Serialization-ready format for export
Key Features
- Create Project: Creates a new project
- Load Project: Load existing project
- Save Project: Save project details in MongoDB
ARMS Wrapper
ARMS Wrapper is responsible for real-time observability of LLM-based operations. It acts as the system's telemetry layer, tracking and recording essential performance metrics that can later be analyzed to evaluate model responsiveness, usage patterns, and service reliability. The ARMS wrapper is connected to Token Metrics, Cost Metrics, LLM Monitor, OCR Monitor, RAG Monitor, Embedding Monitor, and Agent Monitor to record the metrics.
Key Features
- Histogram-based latency tracking
- Counter for metrics
- Integrates easily with external observability platforms
- Unified monitoring across all AI/ML operations
Token Metrics
Tracks input/output tokens used across all LLM interactions within a project session. Provides an overview of token usage for each project. Gets token details from LLM response metadata from LLM monitoring function.
Key Features
- Computes total token usage
- Useful for token management and budget planning
Cost Metrics
Estimates the monetary cost of LLM usage based on the recorded token consumption and configured price rates. Provides an overview of the cost for projects. Calculates cost based on the token usage, which is accessed from LLM response metadata from LLM monitoring function.
Key Features
- Tracks real-time cost as tokens are recorded
- Support model-specific pricing schemes
LLM Monitoring
Captures performance and usage analytics of LLM interactions, providing comprehensive insights into language model operations and performance. LLM monitoring tracks the following key metrics:
Model Information
Model name, version, provider, and configuration details
Token Metrics
Input tokens, output tokens, total tokens, and token efficiency
Performance Metrics
Latency, throughput, response time, and processing speed
Quality Metrics
Response relevance, completion rates, and error handling
Cost Tracking
Per-request cost based on token consumption and model pricing
Operational Metrics
Request success rates, failure patterns, and system status
Content Analysis
Prompt complexity, response length, and content quality
Throughput Metrics
Tokens per second, output throughput, and total throughput
Governance Metrics
Content safety, prompt injection detection, and response quality assessment
Relevance Scoring
Content relevance and accuracy metrics
Key Features
- Real-time LLM performance monitoring and alerting
- Model-specific performance comparison and benchmarking
- Token usage optimization and cost analysis
- Quality assessment and response validation
- Performance degradation detection and alerting
- Comprehensive cost tracking across different models and providers
- Error analysis and failure pattern detection
- Request/response content analysis and quality scoring
- Throughput performance analysis and optimization
- Governance and compliance monitoring
- Content safety and quality assurance
OCR Monitoring
Captures performance and quality metrics of Optical Character Recognition operations, providing comprehensive insights into text extraction processes. OCR monitoring tracks the following key metrics:
Model Information
OCR engine used (EasyOCR, Textract, Azure Document Intelligence, Azure Computer Vision, Google Vision AI, PaddleOCR, Tesseract, Vision LLM)
Text Processing Metrics
Extracted text length, confidence scores, processing duration
Performance Metrics
Latency, throughput, error rates
Cost Tracking
Per-operation cost based on OCR service pricing
Quality Assessment
Confidence scores and accuracy metrics
Key Features
- Real-time OCR performance monitoring
- Model-specific performance comparison
- Quality metrics for text extraction accuracy
- Cost optimization insights across different OCR providers
- Error tracking and failure analysis
RAG Monitoring
Captures comprehensive metrics for Retrieval-Augmented Generation operations, monitoring the entire pipeline from document retrieval to response generation. RAG monitoring tracks:
Retrieval Metrics
Query processing, document count, relevance scoring
Performance Metrics
Latency, throughput, success rates
Quality Metrics
Relevance scores, result accuracy, user satisfaction
Operational Metrics
Function calls, error handling, system status
Cost Analysis
Per-query cost and resource utilization
Key Features
- End-to-end RAG pipeline monitoring
- Relevance scoring and quality assessment
- Performance optimization insights
- Cost tracking across different RAG operations
- Error analysis and failure pattern detection
Embedding Monitoring
Tracks performance and quality metrics for vector embedding generation operations, essential for semantic search and AI applications. Embedding monitoring captures:
Input Metrics
Text length, complexity, preprocessing time
Output Metrics
Vector dimensions, quality scores, generation time
Performance Metrics
Latency, throughput, resource utilization
Quality Assessment
Embedding quality scores and consistency
Cost Tracking
Per-embedding operation costs
Key Features
- Real-time embedding performance monitoring
- Quality assessment and consistency tracking
- Performance optimization recommendations
- Cost analysis across different embedding models
- Resource utilization monitoring
Agent Monitoring
Distributed tracing for AI agent workflows. On-prem deployments support two integration paths — pick one per application; do not mix them in the same run.
LangChain / LangGraph
Use arms.langchain_callback and pass it in the graph or agent config callbacks.
Integration
`arms.langchain_callback` in `config={"callbacks": [...]}`
Span types
Generation, tool, and chain spans from LangChain runnables
LangGraph
Node and step metadata (`langgraph_node`, `langgraph_step`)
Framework tags
`langchain`, `langgraph`
elsai Agents (on-prem)
Use arms.elsai_agents_hook on the top-level manager, GraphBuilder, or Swarm — not on every leaf agent.
Integration
`arms.elsai_agents_hook` via `AgentConfig`, `GraphBuilder.set_hook_providers`, or `Swarm(hooks=[...])`
Span types
Agent, generation, tool, orchestrator, node, and handoff spans
Orchestrators
elsai Graph and Swarm with node chains and Swarm handoff spans
Nested as_tool
Reject, queue, and spawn modes with child agent tracing
Framework tags (span metadata)
`elsai_agents` on most spans; `elsai_graph` or `elsai_swarm` on orchestrator root span only
Shared run metrics
Both paths write distributed trace trees and populate the same run-level analytics:
Key Features
- Parent-child span hierarchy with token and cost per generation span
trace_completeness_scoreandoverall_tool_success_ratenodes_executed(from LangGraph node metadata or elsai Graph/Swarm node spans)- Token and cost rollup at
end_run() - Dashboard drill-down via
agent_metricstrace IDs - elsai Graph/Swarm: orchestrator root span tagged
elsai_graphorelsai_swarm; trace document may still showelsai_agents(see elsai Agents monitoring)
Exporter
Formats all accumulated project data into a structured exportable format, JSON.
Key Features
- Modular formatting logic
- Merges multiple sources (e.g., tokens + cost + latency + OCR + RAG + Embedding + Agent metrics)
- Structured output with computed summaries
- Comprehensive coverage of all monitoring domains
Functional Flow
System Architecture Flow
Understanding how ARMS components interact and process data
Monitoring Integration
Unified Monitoring Framework
The ARMS framework provides unified monitoring across all AI/ML operations, ensuring comprehensive observability
Unified Metrics Collection
All monitoring modules feed into a centralized metrics system
Cross-Component Analysis
Correlation between different operation types for holistic insights
Performance Optimization
Data-driven recommendations for improving system efficiency
Cost Management
Comprehensive cost tracking across all AI/ML operations
Quality Assurance
Quality metrics and performance benchmarks for all operations