NVTrader reference architecture
36 specialized AI agents on the A2A event bus, end-to-end on a single NVIDIA DGX Spark / GB10. Trading research → cuFOLIO GPU optimization → 10-check validation report (Nemotron-narrated) → compliance-gated execution → realized-compare (backtest ↔ paper ↔ live, one strategy_version_id spine) → post-trade drift analysis → fills feed NeMo-RL DPO retrains → auto-promote to vLLM on localhost:8024 (Phase 8 closure) → next narration call hits the trained policy. Phase 9 — LLM-driven Signal Discovery closes a second flywheel: 4 Nemotron-driven agents (Generator · CodeGenerator · Advisor · Orchestrator) propose JSON-AST formulas over a 66-operator vocabulary, evaluate Mean IC + p-value on real yfinance bars, gate on |IC|≥0.02 AND p≤0.05, and promote accepted formulas to sleeves with Grinold-Kahn α-tilt on cuFOLIO scenarios. Architecture adapted from NVIDIA-AI-Blueprints quantitative-signal-discovery-agent. Full OTel audit on every step (including LLM chain-of-thought via llm.reasoning.* spans).
Layer diagram
The seven-and-a-half layer stack — each layer's responsibility, the NVIDIA-owned components, and how the pieces talk. Scroll inside the frame for the full SVG.
Layers at a glance
| Layer | What lives here | NVIDIA bits |
|---|---|---|
| 1 · UI | FastAPI + Tailwind multi-tenant whitelabel; 16 operator pages (login, dashboard, chat, orders, research, data, models, thesis, backtesting, benchmark, continuous-rl, autotrader, observability, reports, deployments, account) | — |
| 2 · Agent orchestration | 36 agents · typed subscribes/emits · compliance gate at every order AND at strategy-definition layer (Phase 7) · Phase 9 Signal Discovery loop (4 agents) co-resident on the same bus | NAT (observability) · AIQ Deep Research · Quant Signal Discovery blueprint |
| 2.5 · A2A event bus | src/traderspace/bus/ · asyncio fan-out · 36 agents · forward cascade + closed feedback loops + Phase 8 DPO closure + Phase 9 Signal-Discovery closed loop | — |
| 3a · LLM inference (cloud) | Multi-model routing with ordered fallback chain | NVIDIA Build API · Nemotron 3 Super 120B · Nemotron 3 Nano Omni 30B · Kimi K2.6 |
| 3a.5 · LLM inference (local, Phase 8) | vLLM 0.21.0 on port 8024 hosting promoted DPO checkpoints. policy_router(decision_type) swaps local↔cloud per decision_type (rebalance / validation / thesis / post-trade narration). Cold-start falls back to cloud automatically. | vLLM · mamba-ssm 2.3.2 · causal-conv1d 1.6.2 · Python 3.13 dedicated env |
| 3b · GPU optimization | Mean-CVaR portfolio LP + KDE scenario gen + NeMo-RL DPO/SFT/GRPO — all on cuda. cuOpt exposed as Plan / Modify / Optimize / Explain skills; in-process or cuOpt NIM backend. | cuFOLIO · cuOpt PDLP / cuOpt NIM · NeMo-RL 0.6.0 · PyTorch 2.11+cu130 |
| 3c · Strategy lifecycle (Phases 0-9) | StrategyVersion + mode meta tag threads through every artifact. Phase 1: backtest completeness (10 metrics + trade_log). Phase 2: 10-check ValidationReport. Phase 3: thesis → spec via Nemotron. Phase 4: realized-compare. Phase 5: live-deploy gate (6-required checklist). Phase 6: post-trade drift. Phase 7: cascade wiring. Phase 8: DPO closure via PolicyPromotionAgent + InferenceServerAgent. Phase 9: LLM-driven Signal Discovery — Generator → CodeGenerator → Evaluator (Mean IC + p-value on real yfinance OHLCV) → Acceptance gate → Advisor critique → loop; accepted formulas promote to sleeves with Grinold-Kahn α-tilt on cuFOLIO scenarios. | — |
| 4 · Data + research sources | Pluggable adapters with unified Bar / NewsItem / Filing schemas | — |
| 5 · Compliance + execution | Hard / soft / warn vetos · 4-layer gate (strategy-definition → cascade → chat → REST) · pluggable BrokerAdapter · Approve / Override / Reject audit · live-deploy modal with frozen params + 6-required checklist | — |
| 6 · Persistence + audit | Fernet-encrypted credentials · immutable decision audit · 51-event AuditAgent · OTel trace ledger · per-tenant isolation · forgot-password / reset flow (15-min single-use tokens) | — |
| 7 · Hardware + deployment | Docker (CUDA 13.0 runtime) · Tailscale · postgres-swappable auth · multi-arch (x86_64 / aarch64) | DGX Spark · GB10 (compute 12.1) |
End-to-end data flow
One trigger fires the full pipeline. ~40 events across 36 agents in ~5 seconds on GB10 (excluding training runs). The Phase 8 DPO closure adds an asynchronous outer loop that auto-promotes trained checkpoints back into inference. The Phase 9 Signal-Discovery loop runs on-demand from a chat intent ("momentum signals on US tech") and typically completes 3 iterations in 60-120 seconds against build.nvidia.com Nemotron 3 Super.
OPTIONAL OUTER LOOP — multi-strategy capital allocator
─────────────────────────────────────────────────────
MultiSleeveRebalanceRequested
→ CapitalAllocationAgent (cuFOLIO Mean-CVaR on sleeve NAVs)
→ CapitalAllocationProposed → PM Agent gates approval
→ CapitalAllocationApproved
→ ComplianceBusAgent (allocator gate: sleeve caps + cross-sleeve concentration + sanity)
→ AllocationCleared / AllocationBlocked
→ CapitalAllocationApprovalAgent → writes data/allocations/active.json
→ CapitalAllocated (terminal; fans out to per-sleeve flow below)
PER-SLEEVE INNER LOOP
─────────────────────
Scheduler.tick.eod
→ DataAgent → DataReady
→ FeatureEngineeringAgent → FeaturesReady
→ Technical · Fundamental · Sentiment (parallel) → ResearchComplete × 3
→ SignalAgent → SignalProposed
→ PortfolioOptimizationAgent (cuFOLIO solve, ~520ms) → RebalanceProposed
→ PortfolioConstructionAgent ← reads active.json: capital = equity × pct[sleeve]
→ RebalanceConstructed
→ ComplianceBusAgent → RebalanceCleared / RebalanceBlocked
(cross-sleeve concentration check fires here
when active.json has > 1 sleeve)
→ PortfolioManagerBusAgent → RebalanceApproved
→ ExecutionBusAgent → OrderPlaced (real broker call)
→ LiveMonitorAgent → OrderFilled
⇢ STRESS-REGIME BRANCH (off DataReady)
DataReady → RegimeDetectorAgent → RegimeMatchProposed / RegimeDraftProposed
→ RegimesChanged → PortfolioOptimizationAgent (synthetic stress
scenarios injected into cuFOLIO solve · same pipeline)
⇢ PHASE 0-6 LIFECYCLE (per strategy_version_id)
BacktestReport → BenchmarkingAgent (train/val/test sweep, one-shot test)
→ BenchmarkReport
BenchmarkReport → ValidationNarrationAgent (Nemotron narrates 10 checks)
→ ValidationReportReady
→ ComplianceAgent (strategy-definition gate: restricted list, universe>50)
→ ComplianceAdvisory
ValidationReportReady (gate ≥ 7/10) → eligible for paper deploy
Deploy modal (live broker only): frozen params + 6-required checklist
→ POST /api/lifecycle/deploy → immutable Deployment record
Paper/live activity → RealizedCompareAgent (backtest ↔ paper ↔ live, joined by strategy_version_id)
→ RealizedCompareReady (drift signals: slippage_delta_bps, behavior_match)
→ PostTradeAnalystAgent (5 drift dims, 7-action recommendation, Nemotron narrates)
→ PostTradeAnalysisReady
↺ FEEDBACK · PREFERENCE FLYWHEEL
RebalanceApproved/Rejected/Override
→ PreferenceLearningAgent → preferences/extract.py (T+5 yfinance backfill)
→ preferences/trainer.py build_dpo_dataset
→ outcome-weighted DPO with label inversion
(weight = f(decision, vs_spy_pct sign);
boost when decision & market agree,
invert chosen↔rejected when they contradict)
→ PreferenceRecorded → NeMoRLFeedbackAgent (counts pairs)
→ TrainNemoRLRequested (algo='dpo')
→ NeMoRLTrainingAgent (subprocess → Py3.13 env)
TRL DPO loop (parallel, in-process):
→ LoRA on TinyLlama-1.1B → data/rlhf/adapters/<ts>/
→ preferences/style_adapter.py hot-reloads → Style-match % + AUC + drift badge
↺ PHASE 8 — DPO CLOSURE (the flywheel finally spins)
NeMoRLTrainingComplete (from runner pump thread)
→ PolicyPromotionAgent
· find_latest_checkpoint(run_id, algo) results/<algo>/<run_id>/step_N
· ensure_hf_checkpoint(step_dir) DCP→HF via NeMo-RL converter subprocess
· parse_final_metrics(log_path) val_loss · val_reward · val_accuracy
· baseline_metrics_for(decision_type) prior active OR fallback floor
· gate_passes(eval, baseline) val_loss↓ OR val_reward↑
· register_candidate + promote candidate → active
→ PolicyPromoted → InferenceServerAgent.reload(checkpoint_path)
→ vLLM hot-reload on localhost:8024 (--served-model-name nvtrader-local-<run_id>)
→ LocalInferenceReloadStarted
Next narration call: policy_router.call_llm(decision_type, ...)
· active policy exists + vLLM healthy → route to local
· else → build.nvidia.com Nemotron 3 Super (cloud fallback)
· RoutedReply.policy_id stamped on response → audit captures who narrated
↺ PHASE 9 — LLM-DRIVEN SIGNAL DISCOVERY
(Adapted from NVIDIA-AI-Blueprints quantitative-signal-discovery-agent)
SignalDiscoveryRequested(intent="momentum on US tech mega-caps")
→ SignalDiscoveryOrchestratorAgent
· (if universe omitted) universe_resolution LLM picks sector slice
from a 13-slice catalog → UniverseResolved
· for iter in 1..max_iterations:
SignalGeneratorAgent (Nemotron 3 Super, temp=0.8)
· sees operator vocabulary (66 ops) + intent + best-so-far + last_feedback
· emits SignalCandidatesGenerated { signals: [JSON AST, ...] }
SignalCodeGeneratorAgent
· validate AST against strict whitelist (no exec)
· compile to vectorized callable via OPERATOR_REGISTRY
· emits SignalCodeCompiled
evaluator (deterministic)
· pull real OHLCV via yfinance (window matches user dates exactly)
· Spearman rank-IC + p-value + IR + decay + quintile spread
· emits SignalEvaluated × N
acceptance gate (|IC|≥0.02 AND p≤0.05)
· pass → SignalAccepted → save data/discovery/signals/<id>.json
· fail → SignalRejected
(if all rejected and not last iter)
OptimizationAdvisorAgent (Nemotron 3 Super, temp=0.5)
· operator-grade critique: "Try TS_Rank instead of CS_Rank,
add TS_Zscore vol normalization, gate when..."
· emits OptimizationAdviceGenerated → binds next iter's prompt
· terminal: SignalDiscoveryComplete { status: accepted|best_effort|failed }
Promote-to-sleeve: POST /api/discovery/signal/accepted/<id>/promote-to-sleeve
→ configs/sleeves/discovered_*.yaml with signal_kind=discovered_formula
→ run_fold loads sleeve → discovery/alpha_overlay.compute_alpha_tilt
(Grinold-Kahn α = z · σ · IC_target)
→ tilt_prices(window_prices, α_daily) # Black-Litterman view at data layer
→ cuFOLIO.generate_scenarios on tilted prices → real signal drives weights
→ FoldResult.signal_overlay records formula + n_rebalances_tilted
Reasoning trace: every Nemotron call in Phases 8/9 emits llm.reasoning.<role>
spans into data/traces/spans.jsonl with chain-of-thought (reasoning_content),
token usage, policy_id provenance. Surfaced on /observability.html and
inline on the Signal-Discovery iteration timeline.
every meaningful event (60+ subscribed) → AuditAgent → data/audit/bus_events.jsonl
(tagged with strategy_version_id, strategy_id, mode)
The outer loop is operator-triggered. When no allocation has been approved, the inner loop runs in single-sleeve mode (each sleeve sizes against full account equity), preserving today's behavior. See the multi-strategy capital allocation skill card for full operational detail.
Full event vocabulary on the Event vocabulary page; full agent roster on the Agent roster page.
NVIDIA components map
| Component | Role | Doc |
|---|---|---|
| Build API | OpenAI-compatible REST proxy to Nemotron + Kimi + 100+ catalog models | PM chat |
| Nemotron 3 Super 120B | Reasoning model with separate reasoning_content channel — DeepResearch, PM narration, backtest Q&A | AIQ Deep Research |
| Nemotron 3 Nano Omni 30B | VLM — reads price charts ~3-4s on GB10 | Research workbench |
| Kimi K2.6 | Tool-calling chat model on build.nvidia.com | PM chat |
| cuFOLIO + cuOpt PDLP | Mean-CVaR portfolio optimization · ~520ms on GB10 | cuFOLIO |
| NVIDIA NeMo-RL 0.6.0 | LLM post-training (SFT/DPO/PPO/GRPO/DAPO/GDPO/RM/distillation) from github.com/NVIDIA-NeMo/RL. Subprocess-launched into a dedicated Python 3.13 env. Drives Nemotron post-training; trained checkpoints feed the Phase 8 closure. | NeMo-RL |
| vLLM 0.21.0 (Phase 8) | Local OpenAI-compatible inference server on port 8024. Hosts promoted DPO checkpoints. Hot-reloads on PolicyPromoted events. Cold-start ~30-60s for 8B; policy_router falls back to cloud Nemotron automatically. | Continuous RL |
| NeMo Agent Toolkit (NAT) | OTel observability with agent-aware semantic conventions | Observability |
| AIQ Deep Research blueprint | Planner → researcher → synthesizer → citer pattern | AIQ Deep Research |
| Quant Signal Discovery blueprint (Phase 9) | Closed-loop alpha-formula discovery adapted from NVIDIA-AI-Blueprints/quantitative-signal-discovery-agent. 4 Nemotron-driven agents (Generator / CodeGenerator / Advisor / Orchestrator) + a 66-operator JSON-AST vocabulary + IC + p-value acceptance gate. Differs from upstream: JSON-AST whitelist (no exec() on LLM output), policy_router for DPO-replaceable LLM roles, Nemotron 3 Super 120B (vs upstream Nano-30B-A3B), Grinold-Kahn α-tilt on cuFOLIO scenarios for backtest integration. | Benchmark · Signal Discovery |
See also
- Full technical brief — the CEO/partner-ready long form.
- A2A event bus engine docs — the substrate that ties it all together.
- Standalone diagram ↗ — full-screen, screenshot-ready for slides.