[ reference · architecture ]

NVTrader reference architecture

36 specialized AI agents on the A2A event bus, end-to-end on a single NVIDIA DGX Spark / GB10. Trading research → cuFOLIO GPU optimization → 10-check validation report (Nemotron-narrated) → compliance-gated execution → realized-compare (backtest ↔ paper ↔ live, one strategy_version_id spine) → post-trade drift analysis → fills feed NeMo-RL DPO retrains → auto-promote to vLLM on localhost:8024 (Phase 8 closure) → next narration call hits the trained policy. Phase 9 — LLM-driven Signal Discovery closes a second flywheel: 4 Nemotron-driven agents (Generator · CodeGenerator · Advisor · Orchestrator) propose JSON-AST formulas over a 66-operator vocabulary, evaluate Mean IC + p-value on real yfinance bars, gate on |IC|≥0.02 AND p≤0.05, and promote accepted formulas to sleeves with Grinold-Kahn α-tilt on cuFOLIO scenarios. Architecture adapted from NVIDIA-AI-Blueprints quantitative-signal-discovery-agent. Full OTel audit on every step (including LLM chain-of-thought via llm.reasoning.* spans).

Open full-screen ↗ Read the technical brief →

Layer diagram

The seven-and-a-half layer stack — each layer's responsibility, the NVIDIA-owned components, and how the pieces talk. Scroll inside the frame for the full SVG.

Layers at a glance

Layer	What lives here	NVIDIA bits
1 · UI	FastAPI + Tailwind multi-tenant whitelabel; 16 operator pages (login, dashboard, chat, orders, research, data, models, thesis, backtesting, benchmark, continuous-rl, autotrader, observability, reports, deployments, account)	—
2 · Agent orchestration	36 agents · typed subscribes/emits · compliance gate at every order AND at strategy-definition layer (Phase 7) · Phase 9 Signal Discovery loop (4 agents) co-resident on the same bus	NAT (observability) · AIQ Deep Research · Quant Signal Discovery blueprint
2.5 · A2A event bus	`src/traderspace/bus/` · asyncio fan-out · 36 agents · forward cascade + closed feedback loops + Phase 8 DPO closure + Phase 9 Signal-Discovery closed loop	—
3a · LLM inference (cloud)	Multi-model routing with ordered fallback chain	NVIDIA Build API · Nemotron 3 Super 120B · Nemotron 3 Nano Omni 30B · Kimi K2.6
3a.5 · LLM inference (local, Phase 8)	vLLM 0.21.0 on port 8024 hosting promoted DPO checkpoints. `policy_router(decision_type)` swaps local↔cloud per decision_type (rebalance / validation / thesis / post-trade narration). Cold-start falls back to cloud automatically.	vLLM · mamba-ssm 2.3.2 · causal-conv1d 1.6.2 · Python 3.13 dedicated env
3b · GPU optimization	Mean-CVaR portfolio LP + KDE scenario gen + NeMo-RL DPO/SFT/GRPO — all on cuda. cuOpt exposed as Plan / Modify / Optimize / Explain skills; in-process or cuOpt NIM backend.	cuFOLIO · cuOpt PDLP / cuOpt NIM · NeMo-RL 0.6.0 · PyTorch 2.11+cu130
3c · Strategy lifecycle (Phases 0-9)	StrategyVersion + mode meta tag threads through every artifact. Phase 1: backtest completeness (10 metrics + trade_log). Phase 2: 10-check ValidationReport. Phase 3: thesis → spec via Nemotron. Phase 4: realized-compare. Phase 5: live-deploy gate (6-required checklist). Phase 6: post-trade drift. Phase 7: cascade wiring. Phase 8: DPO closure via PolicyPromotionAgent + InferenceServerAgent. Phase 9: LLM-driven Signal Discovery — Generator → CodeGenerator → Evaluator (Mean IC + p-value on real yfinance OHLCV) → Acceptance gate → Advisor critique → loop; accepted formulas promote to sleeves with Grinold-Kahn α-tilt on cuFOLIO scenarios.	—
4 · Data + research sources	Pluggable adapters with unified Bar / NewsItem / Filing schemas	—
5 · Compliance + execution	Hard / soft / warn vetos · 4-layer gate (strategy-definition → cascade → chat → REST) · pluggable BrokerAdapter · Approve / Override / Reject audit · live-deploy modal with frozen params + 6-required checklist	—
6 · Persistence + audit	Fernet-encrypted credentials · immutable decision audit · 51-event AuditAgent · OTel trace ledger · per-tenant isolation · forgot-password / reset flow (15-min single-use tokens)	—
7 · Hardware + deployment	Docker (CUDA 13.0 runtime) · Tailscale · postgres-swappable auth · multi-arch (x86_64 / aarch64)	DGX Spark · GB10 (compute 12.1)

End-to-end data flow

One trigger fires the full pipeline. ~40 events across 36 agents in ~5 seconds on GB10 (excluding training runs). The Phase 8 DPO closure adds an asynchronous outer loop that auto-promotes trained checkpoints back into inference. The Phase 9 Signal-Discovery loop runs on-demand from a chat intent ("momentum signals on US tech") and typically completes 3 iterations in 60-120 seconds against build.nvidia.com Nemotron 3 Super.

OPTIONAL OUTER LOOP — multi-strategy capital allocator
  ─────────────────────────────────────────────────────
  MultiSleeveRebalanceRequested
  → CapitalAllocationAgent        (cuFOLIO Mean-CVaR on sleeve NAVs)
  → CapitalAllocationProposed     → PM Agent gates approval
  → CapitalAllocationApproved
  → ComplianceBusAgent (allocator gate: sleeve caps + cross-sleeve concentration + sanity)
  → AllocationCleared / AllocationBlocked
  → CapitalAllocationApprovalAgent → writes data/allocations/active.json
  → CapitalAllocated (terminal; fans out to per-sleeve flow below)

  PER-SLEEVE INNER LOOP
  ─────────────────────
  Scheduler.tick.eod
  → DataAgent              → DataReady
  → FeatureEngineeringAgent → FeaturesReady
  → Technical · Fundamental · Sentiment (parallel) → ResearchComplete × 3
  → SignalAgent            → SignalProposed
  → PortfolioOptimizationAgent (cuFOLIO solve, ~520ms) → RebalanceProposed
  → PortfolioConstructionAgent  ← reads active.json: capital = equity × pct[sleeve]
                               → RebalanceConstructed
  → ComplianceBusAgent     → RebalanceCleared / RebalanceBlocked
                             (cross-sleeve concentration check fires here
                              when active.json has > 1 sleeve)
  → PortfolioManagerBusAgent → RebalanceApproved
  → ExecutionBusAgent      → OrderPlaced (real broker call)
  → LiveMonitorAgent       → OrderFilled

  ⇢ STRESS-REGIME BRANCH (off DataReady)
  DataReady → RegimeDetectorAgent → RegimeMatchProposed / RegimeDraftProposed
            → RegimesChanged → PortfolioOptimizationAgent (synthetic stress
              scenarios injected into cuFOLIO solve · same pipeline)

  ⇢ PHASE 0-6 LIFECYCLE (per strategy_version_id)
  BacktestReport → BenchmarkingAgent (train/val/test sweep, one-shot test)
                 → BenchmarkReport
  BenchmarkReport → ValidationNarrationAgent (Nemotron narrates 10 checks)
                  → ValidationReportReady
                  → ComplianceAgent (strategy-definition gate: restricted list, universe>50)
                  → ComplianceAdvisory
  ValidationReportReady (gate ≥ 7/10) → eligible for paper deploy
  Deploy modal (live broker only): frozen params + 6-required checklist
    → POST /api/lifecycle/deploy → immutable Deployment record
  Paper/live activity → RealizedCompareAgent (backtest ↔ paper ↔ live, joined by strategy_version_id)
                      → RealizedCompareReady (drift signals: slippage_delta_bps, behavior_match)
                      → PostTradeAnalystAgent (5 drift dims, 7-action recommendation, Nemotron narrates)
                      → PostTradeAnalysisReady

  ↺ FEEDBACK · PREFERENCE FLYWHEEL
  RebalanceApproved/Rejected/Override
    → PreferenceLearningAgent → preferences/extract.py (T+5 yfinance backfill)
                              → preferences/trainer.py build_dpo_dataset
                              → outcome-weighted DPO with label inversion
                                 (weight = f(decision, vs_spy_pct sign);
                                  boost when decision & market agree,
                                  invert chosen↔rejected when they contradict)
    → PreferenceRecorded → NeMoRLFeedbackAgent (counts pairs)
                         → TrainNemoRLRequested (algo='dpo')
                         → NeMoRLTrainingAgent (subprocess → Py3.13 env)
  TRL DPO loop (parallel, in-process):
    → LoRA on TinyLlama-1.1B → data/rlhf/adapters/<ts>/
    → preferences/style_adapter.py hot-reloads → Style-match % + AUC + drift badge

  ↺ PHASE 8 — DPO CLOSURE (the flywheel finally spins)
  NeMoRLTrainingComplete (from runner pump thread)
    → PolicyPromotionAgent
       · find_latest_checkpoint(run_id, algo)         results/<algo>/<run_id>/step_N
       · ensure_hf_checkpoint(step_dir)               DCP→HF via NeMo-RL converter subprocess
       · parse_final_metrics(log_path)                val_loss · val_reward · val_accuracy
       · baseline_metrics_for(decision_type)          prior active OR fallback floor
       · gate_passes(eval, baseline)                  val_loss↓ OR val_reward↑
       · register_candidate + promote                 candidate → active
    → PolicyPromoted → InferenceServerAgent.reload(checkpoint_path)
                     → vLLM hot-reload on localhost:8024 (--served-model-name nvtrader-local-<run_id>)
    → LocalInferenceReloadStarted
  Next narration call: policy_router.call_llm(decision_type, ...)
       · active policy exists + vLLM healthy → route to local
       · else → build.nvidia.com Nemotron 3 Super (cloud fallback)
       · RoutedReply.policy_id stamped on response → audit captures who narrated

  ↺ PHASE 9 — LLM-DRIVEN SIGNAL DISCOVERY
  (Adapted from NVIDIA-AI-Blueprints quantitative-signal-discovery-agent)
  SignalDiscoveryRequested(intent="momentum on US tech mega-caps")
    → SignalDiscoveryOrchestratorAgent
       · (if universe omitted) universe_resolution LLM picks sector slice
         from a 13-slice catalog → UniverseResolved
       · for iter in 1..max_iterations:
            SignalGeneratorAgent   (Nemotron 3 Super, temp=0.8)
              · sees operator vocabulary (66 ops) + intent + best-so-far + last_feedback
              · emits SignalCandidatesGenerated { signals: [JSON AST, ...] }
            SignalCodeGeneratorAgent
              · validate AST against strict whitelist (no exec)
              · compile to vectorized callable via OPERATOR_REGISTRY
              · emits SignalCodeCompiled
            evaluator (deterministic)
              · pull real OHLCV via yfinance (window matches user dates exactly)
              · Spearman rank-IC + p-value + IR + decay + quintile spread
              · emits SignalEvaluated × N
            acceptance gate (|IC|≥0.02 AND p≤0.05)
              · pass → SignalAccepted → save data/discovery/signals/<id>.json
              · fail → SignalRejected
            (if all rejected and not last iter)
              OptimizationAdvisorAgent (Nemotron 3 Super, temp=0.5)
                · operator-grade critique: "Try TS_Rank instead of CS_Rank,
                  add TS_Zscore vol normalization, gate when..."
                · emits OptimizationAdviceGenerated → binds next iter's prompt
       · terminal: SignalDiscoveryComplete { status: accepted|best_effort|failed }
  Promote-to-sleeve: POST /api/discovery/signal/accepted/<id>/promote-to-sleeve
    → configs/sleeves/discovered_*.yaml with signal_kind=discovered_formula
    → run_fold loads sleeve → discovery/alpha_overlay.compute_alpha_tilt
       (Grinold-Kahn α = z · σ · IC_target)
    → tilt_prices(window_prices, α_daily)   # Black-Litterman view at data layer
    → cuFOLIO.generate_scenarios on tilted prices  → real signal drives weights
    → FoldResult.signal_overlay records formula + n_rebalances_tilted

  Reasoning trace: every Nemotron call in Phases 8/9 emits llm.reasoning.<role>
    spans into data/traces/spans.jsonl with chain-of-thought (reasoning_content),
    token usage, policy_id provenance. Surfaced on /observability.html and
    inline on the Signal-Discovery iteration timeline.

  every meaningful event (60+ subscribed) → AuditAgent → data/audit/bus_events.jsonl
                                          (tagged with strategy_version_id, strategy_id, mode)

The outer loop is operator-triggered. When no allocation has been approved, the inner loop runs in single-sleeve mode (each sleeve sizes against full account equity), preserving today's behavior. See the multi-strategy capital allocation skill card for full operational detail.

Full event vocabulary on the Event vocabulary page; full agent roster on the Agent roster page.

NVIDIA components map

Component	Role	Doc
Build API	OpenAI-compatible REST proxy to Nemotron + Kimi + 100+ catalog models	PM chat
Nemotron 3 Super 120B	Reasoning model with separate `reasoning_content` channel — DeepResearch, PM narration, backtest Q&A	AIQ Deep Research
Nemotron 3 Nano Omni 30B	VLM — reads price charts ~3-4s on GB10	Research workbench
Kimi K2.6	Tool-calling chat model on build.nvidia.com	PM chat
cuFOLIO + cuOpt PDLP	Mean-CVaR portfolio optimization · ~520ms on GB10	cuFOLIO
NVIDIA NeMo-RL 0.6.0	LLM post-training (SFT/DPO/PPO/GRPO/DAPO/GDPO/RM/distillation) from `github.com/NVIDIA-NeMo/RL`. Subprocess-launched into a dedicated Python 3.13 env. Drives Nemotron post-training; trained checkpoints feed the Phase 8 closure.	NeMo-RL
vLLM 0.21.0 (Phase 8)	Local OpenAI-compatible inference server on port 8024. Hosts promoted DPO checkpoints. Hot-reloads on `PolicyPromoted` events. Cold-start ~30-60s for 8B; `policy_router` falls back to cloud Nemotron automatically.	Continuous RL
NeMo Agent Toolkit (NAT)	OTel observability with agent-aware semantic conventions	Observability
AIQ Deep Research blueprint	Planner → researcher → synthesizer → citer pattern	AIQ Deep Research
Quant Signal Discovery blueprint (Phase 9)	Closed-loop alpha-formula discovery adapted from `NVIDIA-AI-Blueprints/quantitative-signal-discovery-agent`. 4 Nemotron-driven agents (Generator / CodeGenerator / Advisor / Orchestrator) + a 66-operator JSON-AST vocabulary + IC + p-value acceptance gate. Differs from upstream: JSON-AST whitelist (no `exec()` on LLM output), policy_router for DPO-replaceable LLM roles, Nemotron 3 Super 120B (vs upstream Nano-30B-A3B), Grinold-Kahn α-tilt on cuFOLIO scenarios for backtest integration.	Benchmark · Signal Discovery

NVTrader reference architecture

Layer diagram

Layers at a glance

End-to-end data flow

NVIDIA components map

See also