Appearance
Introduction
elsai ARMS (Agent Resource Management System) is a backend framework designed to monitor and manage the usage, cost, and performance of AI-powered projects using large language models. It provides centralized tracking for token consumption, execution metrics, and cost analysis, making it ideal for teams running multiple AI agents or pipelines.
Overview
elsai ARMS is a lightweight monitoring and cost-tracking system for LLM-based agents and applications. Designed for observability, it enables project-based tracking and reporting. It integrates with OpenTelemetry via the TelemetryWrapper to log token usage (Token Metrics), latency (LLM Monitor), and estimated costs (Cost Metrics). Projects are managed using a Project Manager that interfaces with MongoDB, DynamoDB, or ClickHouse for persistent storage, while an Exporter module serializes the logged data into JSON for reporting.
Key Features
Token usage tracking
Track input and output tokens across every LLM interaction in a project.
Cost tracking
Monitor monetary cost per model, including separate web search costs.
LLM Monitoring
Latency, throughput, quality, and governance metrics for language models.
RAG metrics
End-to-end retrieval-augmented generation pipeline observability.
Embedding tracking
Vector embedding generation performance and quality metrics.
OCR processing
Text extraction performance across OCR engines and providers.
Agent monitoring
Distributed tracing for multi-component AI agent workflows.
Web search metrics
Sources, citations, search queries, and search-specific costs.
Execution logs
Structured run logs with success/error analysis.
Performance metrics
System and process-level resource utilization tracking.
JSON export
Exportable structured reports for analytics and observability tools.
Custom metrics
Log user-defined values like accuracy, relevance, and classification labels.
Metrics Reference
▶
Project Metrics
| Field | Description |
|---|---|
| Time of Creation | Timestamp when the project or a processing run was initiated. |
| Total Tokens | Cumulative sum of tokens processed (input + output) across all runs. |
| Total Cost per Model | Aggregated cost grouped by model (e.g., GPT-4, Claude, etc.). |
| Average Frequency | Average number of runs per day/hour/week. |
| Success Rate | Percentage of successful runs out of all runs. |
| Number of Successful Runs | Total number of runs that ended successfully (no crash). |
▶
System Metrics
| Field | Description |
|---|---|
| Metrics Source | Source of system metrics (e.g., "otel + psutil"). |
| Process Status | Current status of the running process. |
| CPU Threads | Number of CPU threads currently in use. |
| Memory Usage | Current memory consumption in bytes. |
| CPU Utilization | Current CPU usage percentage. |
| Disk Space | Total, used, and free disk space in bytes. |
| Network Stats | Network packets, errors, connections, and I/O statistics. |
| OpenTelemetry Metrics | Detailed OTEL metrics including LLM latency, system resources, and process information. |
| Collection Duration | Time taken to collect all system metrics. |
▶
Run Metrics
| Field | Description |
|---|---|
| LLM Details | Summary of all LLM calls made in this run (can include model, tokens, cost). |
| Custom Metrics | User-defined values like accuracy, classification labels, relevance score, etc. |
| Logs | Event logs or trace logs emitted during the run (info, warning, error). May include stages, retries, failures, etc. |
| Start Time | The exact timestamp when the run or process began execution. |
| End Time | The exact timestamp when the run or process completed execution. |
| Execution Duration | The total time taken from the start to the end of the run, typically measured in seconds. |
| Success Status | SUCCESS/FAILED indicating whether the run met its goal. |
▶
LLM Metrics
| Field | Description |
|---|---|
| Model | Name of the model used (e.g., gpt-4o-mini-2024-07-18). |
| LLM Provider | Provider of the LLM service (e.g., OpenAI, Anthropic, Google). |
| Input Tokens | Number of tokens in the input prompt. |
| Output Tokens | Number of tokens in the generated response. |
| Total Tokens | Sum of input and output tokens. |
| Latency (ms) | Response time in milliseconds. |
| Prompt | The input text sent to the LLM. |
| Response | The generated response from the LLM. |
| Relevance Score | Score indicating how relevant the response is to the query. |
| Cost | Total monetary cost of the LLM call (includes token cost + web search cost if applicable). |
| Token Cost | Monetary cost of tokens only (separate from web search costs). |
| Tokens per Second | Processing speed in tokens per second. |
| Output Throughput | Output generation speed in tokens per second. |
| Total Throughput | Overall processing speed including input and output. |
| Governance Metrics | Content safety, prompt injection detection, and response quality assessment. |
▶
Web Search Metrics
| Field | Description |
|---|---|
| Provider | Type of web search implementation: 'native' (OpenAI/Perplexity), 'grounding' (Gemini), or 'tool_based' (Claude). |
| Enabled | Boolean indicating whether web search was used in this LLM call. |
| Search Count | Number of web searches performed. |
| Sources Found | Total number of sources found during web search. |
| Citations Used | Number of citations actually used in the response text. |
| Search Cost | Monetary cost of web search operations, separate from token costs. |
| Citations | List of citations with URLs and titles (limited to 20 citations). |
▶
RAG Metrics
| Field | Description |
|---|---|
| Function Name | Name of the RAG function called (e.g., retrieve_documents). |
| Query | The search query used for document retrieval. |
| Query Length | Number of characters in the query. |
| Docs Count | Total number of documents available for retrieval. |
| Result Count | Number of documents returned in results. |
| Timestamp | When the RAG operation was performed. |
| Relevance Score | Quality metric for retrieval relevance (0.0 to 1.0). |
| Status | Success/failure status of the operation. |
| Latency | Time taken for the RAG operation (in ms). |
| Cost | Cost associated with the RAG operation. |
| Error | Error details if the operation failed. |
| Operation Type | Type of RAG operation performed. |
▶
Embedding Metrics
| Field | Description |
|---|---|
| Function Name | Name of the embedding function called (e.g., get_embedding). |
| Input Length | Number of characters in the input text. |
| Dimensions | Number of dimensions in the generated embedding vector. |
| Timestamp | When the embedding operation was performed. |
| Quality Score | Quality metric for the generated embedding (0.0 to 1.0). |
| Status | Success/failure status of the operation. |
| Latency | Time taken for the embedding operation (in ms). |
| Cost | Cost associated with the embedding operation. |
| Error | Error details if the operation failed. |
| Operation Type | Type of embedding operation performed. |
▶
OCR Metrics
| Field | Description |
|---|---|
| Document Type | Type of document processed (e.g., PDF, image, scanned document). |
| OCR Engine | OCR service or library used for text extraction. |
| Extracted Text | The text content extracted from the document. |
| Confidence Score | OCR confidence level for the extracted text. |
| Processing Time | Time taken to process the document. |
| Page Count | Number of pages in the document. |
| Text Quality | Quality assessment of the extracted text. |
▶
Agent Metrics
| Field | Description |
|---|---|
| Trace Metrics | See Trace Metrics section below for detailed trace-level metrics. |
| Span Metrics | See Span Metrics section below for detailed span-level metrics. |
▶
Trace Metrics
| Field | Description |
|---|---|
| Trace ID | Unique identifier for the entire agent execution trace. |
| Created At | Timestamp when the trace was initiated. |
| Framework | The agent framework for the trace. Values include langchain, langgraph, elsai_agents, elsai_graph, and elsai_swarm. For elsai Agents hook runs, this field reflects the first span that opened the trace — Graph and Swarm runs often show elsai_agents here even though the orchestrator root span carries elsai_graph or elsai_swarm. |
| Input Summary | Summary of the initial input provided to the agent (typically JSON-serialized messages or prompts). |
| Output Summary | Summary of the final output produced by the agent (typically JSON-serialized messages or responses). |
| Total Duration (seconds) | Total time taken for the entire agent execution from start to completion. |
| Total LLM Calls | Cumulative count of all LLM invocations across the entire trace, including calls made within tool functions. |
| Total Tokens | Sum of all tokens (input + output) consumed across all LLM calls in the trace, including tokens from LLM calls inside tools. |
| Total Tool Calls | Number of tool/function calls executed during the trace. |
| Total Spans | Total number of spans (operations) in the distributed trace. |
| Models Used | List of all LLM models used during the trace (e.g., ["gpt-3.5-turbo-0125", "gpt-4"]). |
| Nodes Executed | List of agent graph nodes that were executed (e.g., ["assistant", "tools", "router"]). |
| Root Span | Information about the root span of the trace, including span_id, name, and observation_type. |
| Parent Spans | Hierarchical mapping of parent-child relationships between spans, showing the execution flow structure. |
| Project ID | Identifier of the project this trace belongs to. |
▶
Span Metrics
| Field | Description |
|---|---|
| Span ID | Unique identifier for this specific span/operation. |
| Trace ID | The trace this span belongs to. |
| Parent Span ID | ID of the parent span (null for root spans). |
| Name | Name of the operation (e.g., "ChatOpenAI", "add", "assistant", "tools", "tools_condition"). |
| Observation Type | Type of operation: "generation" (LLM call), "chain" (composite operation), "tool" (tool execution). |
| Start Time | Nanosecond-precision timestamp when the span started. |
| End Time | Nanosecond-precision timestamp when the span completed. |
| Duration (seconds) | Time taken for this specific span to execute. |
| Status Code | Status of the operation (e.g., "UNSET", "OK", "ERROR"). |
| Created At | Timestamp when the span was created. |
| Input | The input data for this span (format depends on observation_type). |
| Output | The output data from this span (format depends on observation_type). |
| Input Size (bytes) | Size of the input data in bytes. |
| Output Size (bytes) | Size of the output data in bytes. |
| Model | LLM model name (only for generation-type spans). |
| Input Tokens | Number of input tokens (only for generation-type spans). |
| Output Tokens | Number of output tokens (only for generation-type spans). |
| Total Tokens | Sum of input and output tokens (only for generation-type spans). |
| Metadata | Additional metadata including framework-specific information, tags, and configuration. |
| Expires At | Timestamp when this span data expires (for data retention policies). |
| Project ID | Identifier of the project this span belongs to. |