[ platform · observability ]

Observability

Where the system shows its work. Three panels: the live agent event bus log + topology, the OTel trace explorer (NAT semantic spans), and the LLM utilization summary (hops / tokens / fallback chain).

What it is · how it works · why it matters

[ what ]

A live view of the agent system: bus event log, OTel trace explorer (NAT spans), LLM utilization (hops / tokens / fallback chain), audit decisions.

[ how ]

Every meaningful op emits an OTel span via NAT to data/traces/spans.jsonl. The "▶ Trigger end-to-end" button fires Scheduler.tick.eod and shows ~30 events across 25 agents in 5 seconds. Optional Phoenix export via docker-compose --profile phoenix.

[ why ]

Auditable decision traces are a common requirement for regulated and fiduciary workflows. Every decision is tappable; every order traces back to the spans that produced it. This is what makes the platform auditable.

Overview

NVTrader instruments every meaningful operation with OTel spans through NeMo Agent Toolkit (NAT). Spans land in data/traces/spans.jsonl (10 MB rotation). The Observability page renders three views over that data.

Agent event bus

Top of the page. Click ▶ Trigger end-to-end to fire Scheduler.tick.eod onto the A2A bus. The event log streams as the cascade plays out — roughly 30 events across 25 agents in 5 seconds.

The topology card to the right shows the agent registry. Each agent lists its subscribes and emits. Click any agent to filter the event log to events it touched.

See A2A event bus engine docs for the deep dive.

Trace explorer

Middle of the page. Lists OTel spans newest-first. Each row shows trace_id, span_id, span name (e.g. cufolio.solve · broker.alpaca.place_order · bus.publish), elapsed_ms, status, and the agent that emitted.

Click a span to open the detail drawer — full attribute dict, parent trace, child spans. Aggregations at the top: p50 / p95 / max latency per span name; throughput per minute.

LLM utilization

Bottom of the page. One row per recent chat turn:

What to look for

signallikely causeaction
cuFOLIO span > 1.5 sn_scenarios too high or CPU fallbackcheck device attribute; lower scenarios.
broker span errors spikevenue rate-limit or PDTsee Orders page rationale.
Bus event count drops to 0scheduler skipped or process crashedcheck scheduler page + last server restart.
LLM hops hit cap on every turnKimi confused by tool result formatsimplify the tool's JSON return; or raise CHAT_MAX_HOPS.
Fallback model fires oftenprimary throwing 429 or degeneratingcheck NVIDIA Build rate-limit dashboard; lower frequency_penalty.

Sending spans to your own collector

NAT writes JSONL by default. To ship to Phoenix / Tempo / Jaeger / DataDog, set OTEL_EXPORTER_OTLP_ENDPOINT in .env and bring up docker-compose --profile phoenix. The collector mirrors spans to both the JSONL file and the OTLP endpoint.

REST surface

VerbPathPurpose
GET/api/observability/traces?limit=100Tail spans.
GET/api/observability/statsLatency aggs per span name.
GET/api/observability/llmLLM utilization rows.
GET/api/bus/events?limit=200Bus event ring buffer.
GET/api/bus/agentsRegistry topology.
POST/api/bus/triggerFire an event onto the bus.
NVTrader v0.1.18 · docs ·⚠ Not financial advice ·Docs home ·App