Skip to content

Introduction

elsai ARMS (Agent Resource Management System) is a backend framework designed to monitor and manage the usage, cost, and performance of AI-powered projects using large language models. It provides centralized tracking for token consumption, execution metrics, and cost analysis, making it ideal for teams running multiple AI agents or pipelines.

Overview

elsai ARMS is a lightweight monitoring and cost-tracking system for LLM-based agents and applications. Designed for observability, it enables project-based tracking and reporting. It integrates with OpenTelemetry via the TelemetryWrapper to log token usage (Token Metrics), latency (LLM Monitor), and estimated costs (Cost Metrics). Projects are managed using a Project Manager that interfaces with MongoDB, DynamoDB, or ClickHouse for persistent storage, while an Exporter module serializes the logged data into JSON for reporting.

Key Features

Token usage tracking

Track input and output tokens across every LLM interaction in a project.

Cost tracking

Monitor monetary cost per model, including separate web search costs.

LLM Monitoring

Latency, throughput, quality, and governance metrics for language models.

RAG metrics

End-to-end retrieval-augmented generation pipeline observability.

Embedding tracking

Vector embedding generation performance and quality metrics.

OCR processing

Text extraction performance across OCR engines and providers.

Agent monitoring

Distributed tracing for multi-component AI agent workflows.

Web search metrics

Sources, citations, search queries, and search-specific costs.

Execution logs

Structured run logs with success/error analysis.

Performance metrics

System and process-level resource utilization tracking.

JSON export

Exportable structured reports for analytics and observability tools.

Custom metrics

Log user-defined values like accuracy, relevance, and classification labels.

Metrics Reference

FieldDescription
Time of CreationTimestamp when the project or a processing run was initiated.
Total TokensCumulative sum of tokens processed (input + output) across all runs.
Total Cost per ModelAggregated cost grouped by model (e.g., GPT-4, Claude, etc.).
Average FrequencyAverage number of runs per day/hour/week.
Success RatePercentage of successful runs out of all runs.
Number of Successful RunsTotal number of runs that ended successfully (no crash).
FieldDescription
Metrics SourceSource of system metrics (e.g., "otel + psutil").
Process StatusCurrent status of the running process.
CPU ThreadsNumber of CPU threads currently in use.
Memory UsageCurrent memory consumption in bytes.
CPU UtilizationCurrent CPU usage percentage.
Disk SpaceTotal, used, and free disk space in bytes.
Network StatsNetwork packets, errors, connections, and I/O statistics.
OpenTelemetry MetricsDetailed OTEL metrics including LLM latency, system resources, and process information.
Collection DurationTime taken to collect all system metrics.
FieldDescription
LLM DetailsSummary of all LLM calls made in this run (can include model, tokens, cost).
Custom MetricsUser-defined values like accuracy, classification labels, relevance score, etc.
LogsEvent logs or trace logs emitted during the run (info, warning, error). May include stages, retries, failures, etc.
Start TimeThe exact timestamp when the run or process began execution.
End TimeThe exact timestamp when the run or process completed execution.
Execution DurationThe total time taken from the start to the end of the run, typically measured in seconds.
Success StatusSUCCESS/FAILED indicating whether the run met its goal.
FieldDescription
ModelName of the model used (e.g., gpt-4o-mini-2024-07-18).
LLM ProviderProvider of the LLM service (e.g., OpenAI, Anthropic, Google).
Input TokensNumber of tokens in the input prompt.
Output TokensNumber of tokens in the generated response.
Total TokensSum of input and output tokens.
Latency (ms)Response time in milliseconds.
PromptThe input text sent to the LLM.
ResponseThe generated response from the LLM.
Relevance ScoreScore indicating how relevant the response is to the query.
CostTotal monetary cost of the LLM call (includes token cost + web search cost if applicable).
Token CostMonetary cost of tokens only (separate from web search costs).
Tokens per SecondProcessing speed in tokens per second.
Output ThroughputOutput generation speed in tokens per second.
Total ThroughputOverall processing speed including input and output.
Governance MetricsContent safety, prompt injection detection, and response quality assessment.
FieldDescription
ProviderType of web search implementation: 'native' (OpenAI/Perplexity), 'grounding' (Gemini), or 'tool_based' (Claude).
EnabledBoolean indicating whether web search was used in this LLM call.
Search CountNumber of web searches performed.
Sources FoundTotal number of sources found during web search.
Citations UsedNumber of citations actually used in the response text.
Search CostMonetary cost of web search operations, separate from token costs.
CitationsList of citations with URLs and titles (limited to 20 citations).
FieldDescription
Function NameName of the RAG function called (e.g., retrieve_documents).
QueryThe search query used for document retrieval.
Query LengthNumber of characters in the query.
Docs CountTotal number of documents available for retrieval.
Result CountNumber of documents returned in results.
TimestampWhen the RAG operation was performed.
Relevance ScoreQuality metric for retrieval relevance (0.0 to 1.0).
StatusSuccess/failure status of the operation.
LatencyTime taken for the RAG operation (in ms).
CostCost associated with the RAG operation.
ErrorError details if the operation failed.
Operation TypeType of RAG operation performed.
FieldDescription
Function NameName of the embedding function called (e.g., get_embedding).
Input LengthNumber of characters in the input text.
DimensionsNumber of dimensions in the generated embedding vector.
TimestampWhen the embedding operation was performed.
Quality ScoreQuality metric for the generated embedding (0.0 to 1.0).
StatusSuccess/failure status of the operation.
LatencyTime taken for the embedding operation (in ms).
CostCost associated with the embedding operation.
ErrorError details if the operation failed.
Operation TypeType of embedding operation performed.
FieldDescription
Document TypeType of document processed (e.g., PDF, image, scanned document).
OCR EngineOCR service or library used for text extraction.
Extracted TextThe text content extracted from the document.
Confidence ScoreOCR confidence level for the extracted text.
Processing TimeTime taken to process the document.
Page CountNumber of pages in the document.
Text QualityQuality assessment of the extracted text.
FieldDescription
Trace MetricsSee Trace Metrics section below for detailed trace-level metrics.
Span MetricsSee Span Metrics section below for detailed span-level metrics.
FieldDescription
Trace IDUnique identifier for the entire agent execution trace.
Created AtTimestamp when the trace was initiated.
FrameworkThe agent framework for the trace. Values include langchain, langgraph, elsai_agents, elsai_graph, and elsai_swarm. For elsai Agents hook runs, this field reflects the first span that opened the trace — Graph and Swarm runs often show elsai_agents here even though the orchestrator root span carries elsai_graph or elsai_swarm.
Input SummarySummary of the initial input provided to the agent (typically JSON-serialized messages or prompts).
Output SummarySummary of the final output produced by the agent (typically JSON-serialized messages or responses).
Total Duration (seconds)Total time taken for the entire agent execution from start to completion.
Total LLM CallsCumulative count of all LLM invocations across the entire trace, including calls made within tool functions.
Total TokensSum of all tokens (input + output) consumed across all LLM calls in the trace, including tokens from LLM calls inside tools.
Total Tool CallsNumber of tool/function calls executed during the trace.
Total SpansTotal number of spans (operations) in the distributed trace.
Models UsedList of all LLM models used during the trace (e.g., ["gpt-3.5-turbo-0125", "gpt-4"]).
Nodes ExecutedList of agent graph nodes that were executed (e.g., ["assistant", "tools", "router"]).
Root SpanInformation about the root span of the trace, including span_id, name, and observation_type.
Parent SpansHierarchical mapping of parent-child relationships between spans, showing the execution flow structure.
Project IDIdentifier of the project this trace belongs to.
FieldDescription
Span IDUnique identifier for this specific span/operation.
Trace IDThe trace this span belongs to.
Parent Span IDID of the parent span (null for root spans).
NameName of the operation (e.g., "ChatOpenAI", "add", "assistant", "tools", "tools_condition").
Observation TypeType of operation: "generation" (LLM call), "chain" (composite operation), "tool" (tool execution).
Start TimeNanosecond-precision timestamp when the span started.
End TimeNanosecond-precision timestamp when the span completed.
Duration (seconds)Time taken for this specific span to execute.
Status CodeStatus of the operation (e.g., "UNSET", "OK", "ERROR").
Created AtTimestamp when the span was created.
InputThe input data for this span (format depends on observation_type).
OutputThe output data from this span (format depends on observation_type).
Input Size (bytes)Size of the input data in bytes.
Output Size (bytes)Size of the output data in bytes.
ModelLLM model name (only for generation-type spans).
Input TokensNumber of input tokens (only for generation-type spans).
Output TokensNumber of output tokens (only for generation-type spans).
Total TokensSum of input and output tokens (only for generation-type spans).
MetadataAdditional metadata including framework-specific information, tags, and configuration.
Expires AtTimestamp when this span data expires (for data retention policies).
Project IDIdentifier of the project this span belongs to.

Copyright © 2026 elsai foundry.