PAPER NVTrader v0.1.18 · 28 agents on the A2A event bus · 39 chat tools build.nvidia.com documentation
N
NVTrader
v0.1.18
Guide & FAQ

How NVTrader works — and how to drive it.

The agent team
Agent Workflow Pipeline
28 specialized agents across 8 tiers · A2A async pub/sub bus (26 active by default + 4 example agents) · NAT OTel taps every emit
live · run · Synthesis
1. Data
DataAgent
1.5. Eng
FeatureEng
2. Research
PredictiveModeling
FundamentalAgent
TechnicalAgent
SentimentAgent
AIFactorAgent
DeepResearchAgent
3. Synthesis
SignalAgent
MetaAgent
4–7. Downstream
BacktestAgent
PortfolioOpt
PortfolioConstr
ComplianceAgent
ExecutionAgent
Live + Reporting
PortfolioManager
complete running gate pending queued
First 60 seconds
Getting started
  1. Open Portfolio — your hub. Equity, returns, Sharpe, daily-ideas, embedded chat.
  2. Search any ticker in the top-nav box (e.g. NVDA) — deep-links to the Research workbench.
  3. Click "Generate fresh ideas →" — PM Agent pulls positions + analyst consensus + market color and writes 3 actionable trades.
  4. Click any strategy tile or "+ New run" on Backtesting to run a walk-forward cuFOLIO backtest.
Talking to the agents
Chatting with the PM Agent
The embedded chat on every page hits /api/chat with 14 live tools. Try these prompts verbatim:
"buy 10 NVDA"
"list my positions"
"top 10 of QQQ, 10% each, save and backtest"
"momentum signal on retail, long top 10"
"give me 3 trading ideas for today"
"NVDA fundamentals + consensus"
"recent SEC filings on UNH"
"analyze AAPL chart for 90 days"
Default model: moonshotai/kimi-k2.6 for tool routing; falls back to nvidia/nemotron-3-super-120b-a12b.
Backtesting
Running a walk-forward backtest
Two paths:
  • Open Backtesting → "+ New run". Pick strategy, benchmark, dates, rebal frequency. Cluster runs on the GB10 with cuFOLIO at each rebalance step. Returns CAGR · Sharpe · Sortino · MaxDD · equity curve.
  • Or ask the chat: "backtest the retail momentum strategy I saved" — the agent calls run_backtest tool.
Strategy persistence
Saving strategies & models
  • Strategies save to configs/sleeves/user_<name>_<ts>.yaml via the chat's save_strategy tool, AutoResearch winners (auto_*.yaml), or PM Modal's Save-as-Preset on Models page.
  • Predictive models (XGBoost, cuML, future NemoRL artifacts) land at data/models/<id>/<version>.pkl with metadata logged to the predictive_models Postgres table.
  • Any saved strategy immediately appears in the Backtesting "+ New run" dropdown and is loadable via python scripts/rebalance.py --sleeve <id>.
Order placement
Placing orders
  • Chat: "buy 10 NVDA at market", "sell 5 AAPL limit 224"
  • PM Modal on Portfolio: Approve buttons fire batch orders from cuFOLIO-optimized plans.
  • Live state: Orders page shows open orders, fills, audit log.
  • All orders are dry-run until you flip LIVE_TRADING=1 in .env. Then they hit Webull paper, which itself is paper-only.
Charts
Charts & VLM technical reads
  • Research page renders price + volume + MA21 / MA50 overlays at: 1D · 5D · 7D · 1M · 30D · 60D · 90D · 6M · 1Y · 2Y. Intraday (1D/5D/7D) uses yfinance 5m / 15m bars; daily for the rest.
  • Click Analyze with Omni VLM → to have nvidia/nemotron-3-nano-omni-30b-a3b-reasoning read the chart pixels: trend / support / resistance / MA state / volume / patterns / PM read.
  • API: GET /api/chart/<sym>.png?interval=90D for the image, POST /api/chart/analyze for the VLM read.
Trust the trace
NAT observability
Every bus event, every cuFOLIO call, every Webull order, every LLM call emits an OpenTelemetry span. Observability page shows the trace tree, per-agent latency p50/p95/p99, token spend breakdown, and BacktestAgent eval scores. NAT collector → Phoenix UI at :6006.
All-NVIDIA stack
Model routing
RoleModelWhy
Tool routing (PM Chat default)moonshotai/kimi-k2.6Clean function-calling, no chain-of-thought noise
Reasoning / fallbacknvidia/nemotron-3-super-120b-a12bHeavyweight analytical synthesis
Chart visionnvidia/nemotron-3-nano-omni-30b-a3b-reasoningVLM — reads chart pixels
Portfolio enginecuFOLIO + cuOpt PDLPGPU CVaR · runs on GB10
Fast option (deferred)nvidia/nemotron-3-nano-30b-a3bSub-second; swap target if chat latency bites
Agent-to-agent communication
Agent event bus (A2A) 15 agents · live

The Pipeline above is the schema. The bus is the runtime — an asyncio pub/sub backbone where every agent registers handlers and `publish(event_type, payload)` fans out with typed contracts. Watch it run live on Observability → Agent event bus.

The forward cascade (single trigger → 25 events, ~5 seconds)
Scheduler.tick.eod     → DataAgent
DataReady              → FeatureEngineeringAgent  (real yfinance bars)
FeaturesReady          → TechnicalAgent · FundamentalAgent · SentimentAgent  (parallel)
ResearchComplete × 3   → SignalAgent fuses 3 views
SignalProposed         → PortfolioOptimizationAgent  (real cuFOLIO solve · ~520ms)
RebalanceProposed      → PortfolioConstructionAgent  (rounds to whole shares)
RebalanceConstructed   → ComplianceAgent             (hard/soft/warn vetos)
RebalanceCleared       → PortfolioManagerAgent       (auto-approve or wait)
RebalanceApproved      → ExecutionAgent              (real broker.place_order)
OrderPlaced            → LiveMonitorAgent            (republishes as OrderFilled on sim)
OrderFilled            → NemoRLFeedbackAgent · PreferenceLearningAgent · AuditAgent
The feedback loops (closing the cycle)
OrderFilled × 10       → NemoRLFeedbackAgent kicks off PPO retrain
                       → PolicyRetrained ← PortfolioManagerAgent subscribes

OrderFilled (every)    → PreferenceLearningAgent refreshes DPO dataset
                       → PreferenceModelUpdated ← PortfolioManagerAgent subscribes

(terminal sink)        → AuditAgent appends every event to data/audit/bus_events.jsonl
Why this matters
  • Real autonomy — agents react to events, not direct calls. Add a new research agent that subscribes to FeaturesReady and it instantly joins the chain. No central dispatcher to edit.
  • Loop closesPreferenceModelUpdated and PolicyRetrained arrive back at the same PortfolioManagerAgent that started the chain. The agent's own consequences become its training signal.
  • Every event traced — each publish + each handler emits a NAT-style OTel span. The Observability page shows the full cascade with millisecond timing per agent.
  • Failure-isolated — one handler crashing doesn't break the rest. Errors land in the event history with the agent name and stack tail.
How to demo it
  1. Set broker to sim on Portfolio (so orders fill instantly).
  2. Go to Observability.
  3. Click ▶ Trigger end-to-end at the top of the Agent event bus panel.
  4. Watch ~25 events cascade through 15 agents in ~5 seconds. The 10th fill triggers a NemoRL retrain — check the Models page for the new policy.

API surface: GET /api/bus/agents · GET /api/bus/events · POST /api/bus/trigger

Provider setup
API keys + Setup Wizard

18 providers supported · all encrypted at rest with a master key chmod 600 in data/.auth_secret.

  • ManualAccount → + Link a key. Pick a provider, paste key (and secret if broker), save. Form auto-adapts: secret/base URL fields hide/show per provider; help text rewrites with acquisition steps.
  • 🪄 WizardAccount → 🪄 Wizard button. PM Agent agent walks you step-by-step in chat. Reads the matching skills/setup/….md, narrates 3-5 steps, pauses between each, then runs test_provider_connection to confirm the key actually works before you walk away.
  • Brokers: Webull · Alpaca · Interactive Brokers · Tradier. Market data: Finnhub · Polygon · Alpha Vantage · IEX · Databento. Research: Tavily · SerpAPI. LLM: NVIDIA Build · NIM (local) · OpenAI · Anthropic · OpenRouter · Hugging Face · OpenAI-compatible.
Pluggable execution
Brokers

Switch with one BROKER= line in .env. All three speak the same BrokerAdapter wire shape — same agent code, same audit log, same Orders page.

BrokerWhenAuto-submit
simDemos · CI · no external API · fills at last yfinance closealways
alpacaUS paper · $100k starter · same-shape live modewhen ALPACA_PAPER=1
webullUS paper (UAT) · matches Webull mobile app accountwhen WEBULL_PAPER=1

Live (real money) requires LIVE_TRADING=1. Paper accounts always auto-submit — that's a safe default.

Self-improving loop
Continuous-learning scheduler

APScheduler runs three jobs in the background while the app is up. View + run them from Models → Continuous-learning scheduler.

  • preference_extract · daily 23:00 ET — re-reads data/audit/rebalance_decisions.jsonl, backfills any newly-matured T+5 market outcomes.
  • dpo_train_check · daily 23:15 ET — rebuilds DPO pair set; if ≥50 pairs + GPU + trl installed, fires the LoRA fine-tune. Otherwise logs the blockers.
  • nemorl_retrain · Sunday 02:00 ET — 20k-step PPO retrain. Skips with a clear reason when no GPU is available.

Override cadence with SCHED_EXTRACT_CRON, SCHED_DPO_CRON, SCHED_NEMORL_CRON. Disable entirely with SCHEDULER_DISABLE=1.

Data flywheel
Preference learning

Every Approve / Override / Reject decision you make on a rebalance is labeled training data. See the live fingerprint on Models → Preference learning.

  • Approve → positive signal. The system records that you wanted that plan executed.
  • Override + reason → negative signal. The reason text becomes the "chosen" side of a DPO pair, the proposal becomes "rejected".
  • Reject → strong negative signal. Same DPO shape, weighted higher.
  • T+5 outcome backfill — every approved/rejected decision gets a realized-return label 5 trading days later, so the trainer can see which calls actually paid off.
  • Personal-style fingerprint — top rejected symbols, sector tilts (approved − rejected), avg turnover when approved vs rejected, realized alpha capture.
  • Training kicks off at ≥50 preference pairs. LoRA adapter persists to data/rlhf/adapters/; subsequent PM narration calls compose the adapter with the base Nemotron Super.
Self-improvement loop
NemoRL AutoResearch pattern from karpathy/autoresearch

The platform designs its own strategies. A Nemotron 3 Super 120B meta-agent reads recent PPO trial outcomes, proposes one structured config edit per iteration, our orchestrator trains a 3,000-step PPO policy on GB10 (~8s), scores Sharpe on a held-out window, and keeps or reverts the edit. Pattern from Karpathy's nanochat work; the safety-typed schema and Sharpe metric are NVTrader's adaptation. See Models → NemoRL AutoResearch.

  • Typed edit surface — 12 bounded knobs (env: lookback, episode_len, rebal_freq, turnover_cost_bps, vol_penalty; PPO: learning_rate, n_steps, batch_size, gae_lambda, gamma, ent_coef, n_epochs). Out-of-bounds proposals clip and log a violation.
  • Keep / revert decision by Sharpe improvement over a 0.05 noise floor. Best policy persists to data/autoresearch/policies/.
  • Closed loop — discover → execute: when a new best Sharpe lands, the policy emits PolicyRetrained on the bus. NemoRLFeedbackAgent hands it to the PM. The next Scheduler.tick.eod runs the discovered policy through cuFOLIO → compliance → Alpaca. Same audit ledger, same observability surface.
  • Append-only journal at data/autoresearch/sessions/<session_id>.jsonl — every AgentProposal, BoundsClip, TrialResult, AgentDecision. Replayable. The UI streams it live.
  • Throughput is the unlock — Karpathy's original used a 5-min training budget. NemoRL trains in 8s. We run 100+ trials in 15 minutes wall-time.
Reference architecture
Architecture

Full 7-layer reference architecture (NVIDIA-style SVG + components table) lives at how-it-works.html. Screenshot-ready for slides.

Layer-by-layer summary:

  1. User interface — FastAPI + Tailwind, 14 dashboard pages, multi-tenant whitelabel
  2. Agent orchestration — A2A async event bus (28 agents (24 active by default)) · NeMo Agent Toolkit (NAT) observability · AIQ Deep Research blueprint · 28 agents total
  3. LLM inference — NVIDIA Build API serving Nemotron 3 Super (reasoning), Nemotron Nano Omni (vision), Kimi K2.6 (tool calls). Fallback chain auto-routes on errors.
  4. GPU optimization — cuFOLIO Mean-CVaR + cuOpt PDLP solver (~520ms on GB10), NemoRL PPO on CUDA
  5. Data sources — yfinance · Webull · Finnhub · EDGAR · Tavily · Polygon · 6 more
  6. Compliance + execution — ComplianceAgent gate → BrokerAdapter (Sim · Alpaca · Webull · IBKR pluggable)
  7. Persistence + audit — encrypted user DB, JSONL audit + trace ledgers, RL policy zips, tenant-isolated
  8. Hardware — NVIDIA DGX Spark · GB10 (compute 12.1) · CUDA 12.3 · Docker
Hard gates
Safety & gates
  • LIVE_TRADING env flag — required for any non-paper broker call. Default 0. Paper brokers (Alpaca paper, SimBroker, Webull paper) ignore this flag and always submit safely.
  • ComplianceAgent — hard gate between Construction and Execution. Position caps · sector caps · turnover · PDT · wash-sale · restricted list. Hard veto on PDT/restricted, soft veto on caps (PM can override with audit reason).
  • Override never submits — clicking Override on a rebalance logs the reason to audit but never reaches the broker. Only Approve places orders.
  • Model fallback chain — when a model degenerates or rate-limits, the chat loop auto-rolls to the next in the chain. No silent failures, no token-loop hallucinations.
  • Audit logs are append-only and persist across restarts: data/audit/*.jsonl, data/traces/spans.jsonl.
FAQ
Frequently asked
What's the difference between a strategy and a strategy?
Sleeve = strategy + capital allocation slot. Multi-strategy desks (Millennium, Citadel) carve their book into sleeves. In NVTrader the same 28 agents serve all sleeves; only the AlphaSource plugs differently. Configs live at configs/sleeves/*.yaml.
Is this real money?
No. WEBULL_PAPER=1 + LIVE_TRADING=0 means every order is dry-run against a paper account. Even with LIVE_TRADING=1, the broker is paper. You'd need to swap to a live Webull subscription + flip PAPER_TRADING=0 to trade real money.
How does cuFOLIO actually work?
Mean-CVaR portfolio optimization on the GB10 GPU. Generate 5,000-10,000 scenarios via KDE (~0.3s), feed into cuOpt PDLP solver (~0.3s). Output: portfolio weights that maximize expected return subject to a CVaR (tail-loss) constraint. See the NVIDIA blueprint.
Why is Kimi (Moonshot) the default chat model — not Nemotron?
Kimi is exceptionally clean at function-calling — it routes tool calls without reasoning-trace noise. Nemotron 3 Super 120B is a reasoning model — it's the right pick for analytical synthesis but slower. Both run via NVIDIA Build API (Moonshot is hosted in NVIDIA's catalog).
What's the AIQ DeepResearch engine?
NVIDIA's open Deep Researcher blueprint (planner → researcher → synthesizer → citer multi-phase pattern) wrapped to drive Nemotron 3 Super via LangChain. POST /api/chat with engine=aiq, or use scripts/deep_research.py.
Where's the actual NemoRL training loop?
M8 (Phase 2). Today's scripts/autoresearch.py handles the outer loop (candidate strategy search + save). The inner loop — RL fine-tuning of a Nemotron variant on agent traces + realized P&L — needs the NemoRL library install + Gym env build + hours of GPU training.
Can I add my own ETF / sector to the chat tools?
Yes — edit ETF_TOP_HOLDINGS or SECTOR_TICKERS in src/traderspace/api/chat_server.py. Restart with scripts/start_server.sh. The chat agent picks up the new options automatically.
How do I restart the server?
~/workspace/traderspace/scripts/start_server.sh — idempotent, kills the existing uvicorn, restarts on 127.0.0.1:8015, verifies port is bound.
NVTrader v0.1.18 ·⚠ Not financial advice ·Terms ·Privacy ·License