Introduction

elsai ARMS (Agent Resource Management System) is a backend framework designed to monitor and manage the usage, cost, and performance of AI-powered projects using large language models. It provides centralized tracking for token consumption, execution metrics, and cost analysis, making it ideal for teams running multiple AI agents or pipelines.

Overview

elsai ARMS is a lightweight monitoring and cost-tracking system for LLM-based agents and applications. Designed for observability, it enables project-based tracking and reporting. It integrates with OpenTelemetry via the TelemetryWrapper to log token usage (Token Metrics), latency (LLM Monitor), and estimated costs (Cost Metrics). Projects are managed using a Project Manager that interfaces with MongoDB, DynamoDB, or ClickHouse for persistent storage, while an Exporter module serializes the logged data into JSON for reporting.

Key Features

Token usage tracking

Track input and output tokens across every LLM interaction in a project.

Cost tracking

Monitor monetary cost per model, including separate web search costs.

LLM Monitoring

Latency, throughput, quality, and governance metrics for language models.

RAG metrics

End-to-end retrieval-augmented generation pipeline observability.

Embedding tracking

Vector embedding generation performance and quality metrics.

OCR processing

Text extraction performance across OCR engines and providers.

Agent monitoring

Distributed tracing for multi-component AI agent workflows.

Web search metrics

Sources, citations, search queries, and search-specific costs.

Execution logs

Structured run logs with success/error analysis.

Performance metrics

System and process-level resource utilization tracking.

JSON export

Exportable structured reports for analytics and observability tools.

Custom metrics

Log user-defined values like accuracy, relevance, and classification labels.

Metrics Reference

▶

Project Metrics

Field	Description
Time of Creation	Timestamp when the project or a processing run was initiated.
Total Tokens	Cumulative sum of tokens processed (input + output) across all runs.
Total Cost per Model	Aggregated cost grouped by model (e.g., GPT-4, Claude, etc.).
Average Frequency	Average number of runs per day/hour/week.
Success Rate	Percentage of successful runs out of all runs.
Number of Successful Runs	Total number of runs that ended successfully (no crash).

▶

System Metrics

Field	Description
Metrics Source	Source of system metrics (e.g., "otel + psutil").
Process Status	Current status of the running process.
CPU Threads	Number of CPU threads currently in use.
Memory Usage	Current memory consumption in bytes.
CPU Utilization	Current CPU usage percentage.
Disk Space	Total, used, and free disk space in bytes.
Network Stats	Network packets, errors, connections, and I/O statistics.
OpenTelemetry Metrics	Detailed OTEL metrics including LLM latency, system resources, and process information.
Collection Duration	Time taken to collect all system metrics.

▶

Run Metrics

Field	Description
LLM Details	Summary of all LLM calls made in this run (can include model, tokens, cost).
Custom Metrics	User-defined values like accuracy, classification labels, relevance score, etc.
Logs	Event logs or trace logs emitted during the run (info, warning, error). May include stages, retries, failures, etc.
Start Time	The exact timestamp when the run or process began execution.
End Time	The exact timestamp when the run or process completed execution.
Execution Duration	The total time taken from the start to the end of the run, typically measured in seconds.
Success Status	SUCCESS/FAILED indicating whether the run met its goal.

▶

LLM Metrics

Field	Description
Model	Name of the model used (e.g., gpt-4o-mini-2024-07-18).
LLM Provider	Provider of the LLM service (e.g., OpenAI, Anthropic, Google).
Input Tokens	Number of tokens in the input prompt.
Output Tokens	Number of tokens in the generated response.
Total Tokens	Sum of input and output tokens.
Latency (ms)	Response time in milliseconds.
Prompt	The input text sent to the LLM.
Response	The generated response from the LLM.
Relevance Score	Score indicating how relevant the response is to the query.
Cost	Total monetary cost of the LLM call (includes token cost + web search cost if applicable).
Token Cost	Monetary cost of tokens only (separate from web search costs).
Tokens per Second	Processing speed in tokens per second.
Output Throughput	Output generation speed in tokens per second.
Total Throughput	Overall processing speed including input and output.
Governance Metrics	Content safety, prompt injection detection, and response quality assessment.

▶

Web Search Metrics

Field	Description
Provider	Type of web search implementation: 'native' (OpenAI/Perplexity), 'grounding' (Gemini), or 'tool_based' (Claude).
Enabled	Boolean indicating whether web search was used in this LLM call.
Search Count	Number of web searches performed.
Sources Found	Total number of sources found during web search.
Citations Used	Number of citations actually used in the response text.
Search Cost	Monetary cost of web search operations, separate from token costs.
Citations	List of citations with URLs and titles (limited to 20 citations).

▶

RAG Metrics

Field	Description
Function Name	Name of the RAG function called (e.g., retrieve_documents).
Query	The search query used for document retrieval.
Query Length	Number of characters in the query.
Docs Count	Total number of documents available for retrieval.
Result Count	Number of documents returned in results.
Timestamp	When the RAG operation was performed.
Relevance Score	Quality metric for retrieval relevance (0.0 to 1.0).
Status	Success/failure status of the operation.
Latency	Time taken for the RAG operation (in ms).
Cost	Cost associated with the RAG operation.
Error	Error details if the operation failed.
Operation Type	Type of RAG operation performed.

▶

Embedding Metrics

Field	Description
Function Name	Name of the embedding function called (e.g., get_embedding).
Input Length	Number of characters in the input text.
Dimensions	Number of dimensions in the generated embedding vector.
Timestamp	When the embedding operation was performed.
Quality Score	Quality metric for the generated embedding (0.0 to 1.0).
Status	Success/failure status of the operation.
Latency	Time taken for the embedding operation (in ms).
Cost	Cost associated with the embedding operation.
Error	Error details if the operation failed.
Operation Type	Type of embedding operation performed.

▶

OCR Metrics

Field	Description
Document Type	Type of document processed (e.g., PDF, image, scanned document).
OCR Engine	OCR service or library used for text extraction.
Extracted Text	The text content extracted from the document.
Confidence Score	OCR confidence level for the extracted text.
Processing Time	Time taken to process the document.
Page Count	Number of pages in the document.
Text Quality	Quality assessment of the extracted text.

▶

Agent Metrics

Field	Description
Trace Metrics	See Trace Metrics section below for detailed trace-level metrics.
Span Metrics	See Span Metrics section below for detailed span-level metrics.

▶

Trace Metrics

Field	Description
Trace ID	Unique identifier for the entire agent execution trace.
Created At	Timestamp when the trace was initiated.
Framework	The agent framework for the trace. Values include `langchain`, `langgraph`, `elsai_agents`, `elsai_graph`, and `elsai_swarm`. For elsai Agents hook runs, this field reflects the first span that opened the trace — Graph and Swarm runs often show `elsai_agents` here even though the orchestrator root span carries `elsai_graph` or `elsai_swarm`.
Input Summary	Summary of the initial input provided to the agent (typically JSON-serialized messages or prompts).
Output Summary	Summary of the final output produced by the agent (typically JSON-serialized messages or responses).
Total Duration (seconds)	Total time taken for the entire agent execution from start to completion.
Total LLM Calls	Cumulative count of all LLM invocations across the entire trace, including calls made within tool functions.
Total Tokens	Sum of all tokens (input + output) consumed across all LLM calls in the trace, including tokens from LLM calls inside tools.
Total Tool Calls	Number of tool/function calls executed during the trace.
Total Spans	Total number of spans (operations) in the distributed trace.
Models Used	List of all LLM models used during the trace (e.g., ["gpt-3.5-turbo-0125", "gpt-4"]).
Nodes Executed	List of agent graph nodes that were executed (e.g., ["assistant", "tools", "router"]).
Root Span	Information about the root span of the trace, including span_id, name, and observation_type.
Parent Spans	Hierarchical mapping of parent-child relationships between spans, showing the execution flow structure.
Project ID	Identifier of the project this trace belongs to.

▶

Span Metrics

Field	Description
Span ID	Unique identifier for this specific span/operation.
Trace ID	The trace this span belongs to.
Parent Span ID	ID of the parent span (null for root spans).
Name	Name of the operation (e.g., "ChatOpenAI", "add", "assistant", "tools", "tools_condition").
Observation Type	Type of operation: "generation" (LLM call), "chain" (composite operation), "tool" (tool execution).
Start Time	Nanosecond-precision timestamp when the span started.
End Time	Nanosecond-precision timestamp when the span completed.
Duration (seconds)	Time taken for this specific span to execute.
Status Code	Status of the operation (e.g., "UNSET", "OK", "ERROR").
Created At	Timestamp when the span was created.
Input	The input data for this span (format depends on observation_type).
Output	The output data from this span (format depends on observation_type).
Input Size (bytes)	Size of the input data in bytes.
Output Size (bytes)	Size of the output data in bytes.
Model	LLM model name (only for generation-type spans).
Input Tokens	Number of input tokens (only for generation-type spans).
Output Tokens	Number of output tokens (only for generation-type spans).
Total Tokens	Sum of input and output tokens (only for generation-type spans).
Metadata	Additional metadata including framework-specific information, tags, and configuration.
Expires At	Timestamp when this span data expires (for data retention policies).
Project ID	Identifier of the project this span belongs to.

Introduction ​

Overview ​

Key Features ​

Metrics Reference ​