NVTrader — Guide & FAQ

The agent team

Agent Workflow Pipeline

28 specialized agents across 8 tiers · A2A async pub/sub bus (26 active by default + 4 example agents) · NAT OTel taps every emit

live · run · Synthesis

1. Data

DataAgent

1.5. Eng

FeatureEng

→

2. Research

PredictiveModeling

FundamentalAgent

TechnicalAgent

SentimentAgent

AIFactorAgent

DeepResearchAgent

→

3. Synthesis

SignalAgent

MetaAgent

→

4–7. Downstream

BacktestAgent

PortfolioOpt

PortfolioConstr

ComplianceAgent

ExecutionAgent

Live + Reporting

PortfolioManager

complete running gate pending queued

First 60 seconds

Getting started

Open Portfolio — your hub. Equity, returns, Sharpe, daily-ideas, embedded chat.
Search any ticker in the top-nav box (e.g. NVDA) — deep-links to the Research workbench.
Click "Generate fresh ideas →" — PM Agent pulls positions + analyst consensus + market color and writes 3 actionable trades.
Click any strategy tile or "+ New run" on Backtesting to run a walk-forward cuFOLIO backtest.

Talking to the agents

Chatting with the PM Agent

The embedded chat on every page hits /api/chat with 14 live tools. Try these prompts verbatim:

"buy 10 NVDA"

"list my positions"

"top 10 of QQQ, 10% each, save and backtest"

"momentum signal on retail, long top 10"

"give me 3 trading ideas for today"

"NVDA fundamentals + consensus"

"recent SEC filings on UNH"

"analyze AAPL chart for 90 days"

Default model: moonshotai/kimi-k2.6 for tool routing; falls back to nvidia/nemotron-3-super-120b-a12b.

Backtesting

Running a walk-forward backtest

Two paths:

Open Backtesting → "+ New run". Pick strategy, benchmark, dates, rebal frequency. Cluster runs on the GB10 with cuFOLIO at each rebalance step. Returns CAGR · Sharpe · Sortino · MaxDD · equity curve.
Or ask the chat: "backtest the retail momentum strategy I saved" — the agent calls run_backtest tool.

Strategy persistence

Saving strategies & models

Strategies save to configs/sleeves/user_<name>_<ts>.yaml via the chat's save_strategy tool, AutoResearch winners (auto_*.yaml), or PM Modal's Save-as-Preset on Models page.
Predictive models (XGBoost, cuML, future NemoRL artifacts) land at data/models/<id>/<version>.pkl with metadata logged to the predictive_models Postgres table.
Any saved strategy immediately appears in the Backtesting "+ New run" dropdown and is loadable via python scripts/rebalance.py --sleeve <id>.

Order placement

Placing orders

Chat: "buy 10 NVDA at market", "sell 5 AAPL limit 224"
PM Modal on Portfolio: Approve buttons fire batch orders from cuFOLIO-optimized plans.
Live state: Orders page shows open orders, fills, audit log.
All orders are dry-run until you flip LIVE_TRADING=1 in .env. Then they hit Webull paper, which itself is paper-only.

Charts

Charts & VLM technical reads

Research page renders price + volume + MA21 / MA50 overlays at: 1D · 5D · 7D · 1M · 30D · 60D · 90D · 6M · 1Y · 2Y. Intraday (1D/5D/7D) uses yfinance 5m / 15m bars; daily for the rest.
Click Analyze with Omni VLM → to have nvidia/nemotron-3-nano-omni-30b-a3b-reasoning read the chart pixels: trend / support / resistance / MA state / volume / patterns / PM read.
API: GET /api/chart/<sym>.png?interval=90D for the image, POST /api/chart/analyze for the VLM read.

Trust the trace

NAT observability

Every bus event, every cuFOLIO call, every Webull order, every LLM call emits an OpenTelemetry span. Observability page shows the trace tree, per-agent latency p50/p95/p99, token spend breakdown, and BacktestAgent eval scores. NAT collector → Phoenix UI at :6006.

All-NVIDIA stack

Model routing

Role	Model	Why
Tool routing (PM Chat default)	moonshotai/kimi-k2.6	Clean function-calling, no chain-of-thought noise
Reasoning / fallback	nvidia/nemotron-3-super-120b-a12b	Heavyweight analytical synthesis
Chart vision	nvidia/nemotron-3-nano-omni-30b-a3b-reasoning	VLM — reads chart pixels
Portfolio engine	cuFOLIO + cuOpt PDLP	GPU CVaR · runs on GB10
Fast option (deferred)	nvidia/nemotron-3-nano-30b-a3b	Sub-second; swap target if chat latency bites

Agent-to-agent communication

Agent event bus (A2A) 15 agents · live

The Pipeline above is the schema. The bus is the runtime — an asyncio pub/sub backbone where every agent registers handlers and `publish(event_type, payload)` fans out with typed contracts. Watch it run live on Observability → Agent event bus.

The forward cascade (single trigger → 25 events, ~5 seconds)

Scheduler.tick.eod     → DataAgent
DataReady              → FeatureEngineeringAgent  (real yfinance bars)
FeaturesReady          → TechnicalAgent · FundamentalAgent · SentimentAgent  (parallel)
ResearchComplete × 3   → SignalAgent fuses 3 views
SignalProposed         → PortfolioOptimizationAgent  (real cuFOLIO solve · ~520ms)
RebalanceProposed      → PortfolioConstructionAgent  (rounds to whole shares)
RebalanceConstructed   → ComplianceAgent             (hard/soft/warn vetos)
RebalanceCleared       → PortfolioManagerAgent       (auto-approve or wait)
RebalanceApproved      → ExecutionAgent              (real broker.place_order)
OrderPlaced            → LiveMonitorAgent            (republishes as OrderFilled on sim)
OrderFilled            → NemoRLFeedbackAgent · PreferenceLearningAgent · AuditAgent

The feedback loops (closing the cycle)

OrderFilled × 10       → NemoRLFeedbackAgent kicks off PPO retrain
                       → PolicyRetrained ← PortfolioManagerAgent subscribes

OrderFilled (every)    → PreferenceLearningAgent refreshes DPO dataset
                       → PreferenceModelUpdated ← PortfolioManagerAgent subscribes

(terminal sink)        → AuditAgent appends every event to data/audit/bus_events.jsonl

Why this matters

Real autonomy — agents react to events, not direct calls. Add a new research agent that subscribes to FeaturesReady and it instantly joins the chain. No central dispatcher to edit.
Loop closes — PreferenceModelUpdated and PolicyRetrained arrive back at the same PortfolioManagerAgent that started the chain. The agent's own consequences become its training signal.
Every event traced — each publish + each handler emits a NAT-style OTel span. The Observability page shows the full cascade with millisecond timing per agent.
Failure-isolated — one handler crashing doesn't break the rest. Errors land in the event history with the agent name and stack tail.

How to demo it

Set broker to sim on Portfolio (so orders fill instantly).
Go to Observability.
Click ▶ Trigger end-to-end at the top of the Agent event bus panel.
Watch ~25 events cascade through 15 agents in ~5 seconds. The 10th fill triggers a NemoRL retrain — check the Models page for the new policy.

API surface: GET /api/bus/agents · GET /api/bus/events · POST /api/bus/trigger

Provider setup

API keys + Setup Wizard

18 providers supported · all encrypted at rest with a master key chmod 600 in data/.auth_secret.

Manual — Account → + Link a key. Pick a provider, paste key (and secret if broker), save. Form auto-adapts: secret/base URL fields hide/show per provider; help text rewrites with acquisition steps.
🪄 Wizard — Account → 🪄 Wizard button. PM Agent agent walks you step-by-step in chat. Reads the matching skills/setup/….md, narrates 3-5 steps, pauses between each, then runs test_provider_connection to confirm the key actually works before you walk away.
Brokers: Webull · Alpaca · Interactive Brokers · Tradier. Market data: Finnhub · Polygon · Alpha Vantage · IEX · Databento. Research: Tavily · SerpAPI. LLM: NVIDIA Build · NIM (local) · OpenAI · Anthropic · OpenRouter · Hugging Face · OpenAI-compatible.

Pluggable execution

Brokers

Switch with one BROKER= line in .env. All three speak the same BrokerAdapter wire shape — same agent code, same audit log, same Orders page.

Broker	When	Auto-submit
sim	Demos · CI · no external API · fills at last yfinance close	always
alpaca	US paper · $100k starter · same-shape live mode	when ALPACA_PAPER=1
webull	US paper (UAT) · matches Webull mobile app account	when WEBULL_PAPER=1

Live (real money) requires LIVE_TRADING=1. Paper accounts always auto-submit — that's a safe default.

Self-improving loop

Continuous-learning scheduler

APScheduler runs three jobs in the background while the app is up. View + run them from Models → Continuous-learning scheduler.

preference_extract · daily 23:00 ET — re-reads data/audit/rebalance_decisions.jsonl, backfills any newly-matured T+5 market outcomes.
dpo_train_check · daily 23:15 ET — rebuilds DPO pair set; if ≥50 pairs + GPU + trl installed, fires the LoRA fine-tune. Otherwise logs the blockers.
nemorl_retrain · Sunday 02:00 ET — 20k-step PPO retrain. Skips with a clear reason when no GPU is available.

Override cadence with SCHED_EXTRACT_CRON, SCHED_DPO_CRON, SCHED_NEMORL_CRON. Disable entirely with SCHEDULER_DISABLE=1.

Data flywheel

Preference learning

Every Approve / Override / Reject decision you make on a rebalance is labeled training data. See the live fingerprint on Models → Preference learning.

Approve → positive signal. The system records that you wanted that plan executed.
Override + reason → negative signal. The reason text becomes the "chosen" side of a DPO pair, the proposal becomes "rejected".
Reject → strong negative signal. Same DPO shape, weighted higher.
T+5 outcome backfill — every approved/rejected decision gets a realized-return label 5 trading days later, so the trainer can see which calls actually paid off.
Personal-style fingerprint — top rejected symbols, sector tilts (approved − rejected), avg turnover when approved vs rejected, realized alpha capture.
Training kicks off at ≥50 preference pairs. LoRA adapter persists to data/rlhf/adapters/; subsequent PM narration calls compose the adapter with the base Nemotron Super.

Self-improvement loop

NemoRL AutoResearch pattern from karpathy/autoresearch

The platform designs its own strategies. A Nemotron 3 Super 120B meta-agent reads recent PPO trial outcomes, proposes one structured config edit per iteration, our orchestrator trains a 3,000-step PPO policy on GB10 (~8s), scores Sharpe on a held-out window, and keeps or reverts the edit. Pattern from Karpathy's nanochat work; the safety-typed schema and Sharpe metric are NVTrader's adaptation. See Models → NemoRL AutoResearch.

Typed edit surface — 12 bounded knobs (env: lookback, episode_len, rebal_freq, turnover_cost_bps, vol_penalty; PPO: learning_rate, n_steps, batch_size, gae_lambda, gamma, ent_coef, n_epochs). Out-of-bounds proposals clip and log a violation.
Keep / revert decision by Sharpe improvement over a 0.05 noise floor. Best policy persists to data/autoresearch/policies/.
Closed loop — discover → execute: when a new best Sharpe lands, the policy emits PolicyRetrained on the bus. NemoRLFeedbackAgent hands it to the PM. The next Scheduler.tick.eod runs the discovered policy through cuFOLIO → compliance → Alpaca. Same audit ledger, same observability surface.
Append-only journal at data/autoresearch/sessions/<session_id>.jsonl — every AgentProposal, BoundsClip, TrialResult, AgentDecision. Replayable. The UI streams it live.
Throughput is the unlock — Karpathy's original used a 5-min training budget. NemoRL trains in 8s. We run 100+ trials in 15 minutes wall-time.

Reference architecture

Architecture

Full 7-layer reference architecture (NVIDIA-style SVG + components table) lives at how-it-works.html. Screenshot-ready for slides.

Layer-by-layer summary:

User interface — FastAPI + Tailwind, 14 dashboard pages, multi-tenant whitelabel
Agent orchestration — A2A async event bus (28 agents (24 active by default)) · NeMo Agent Toolkit (NAT) observability · AIQ Deep Research blueprint · 28 agents total
LLM inference — NVIDIA Build API serving Nemotron 3 Super (reasoning), Nemotron Nano Omni (vision), Kimi K2.6 (tool calls). Fallback chain auto-routes on errors.
GPU optimization — cuFOLIO Mean-CVaR + cuOpt PDLP solver (~520ms on GB10), NemoRL PPO on CUDA
Data sources — yfinance · Webull · Finnhub · EDGAR · Tavily · Polygon · 6 more
Compliance + execution — ComplianceAgent gate → BrokerAdapter (Sim · Alpaca · Webull · IBKR pluggable)
Persistence + audit — encrypted user DB, JSONL audit + trace ledgers, RL policy zips, tenant-isolated
Hardware — NVIDIA DGX Spark · GB10 (compute 12.1) · CUDA 12.3 · Docker

Hard gates

Safety & gates

LIVE_TRADING env flag — required for any non-paper broker call. Default 0. Paper brokers (Alpaca paper, SimBroker, Webull paper) ignore this flag and always submit safely.
ComplianceAgent — hard gate between Construction and Execution. Position caps · sector caps · turnover · PDT · wash-sale · restricted list. Hard veto on PDT/restricted, soft veto on caps (PM can override with audit reason).
Override never submits — clicking Override on a rebalance logs the reason to audit but never reaches the broker. Only Approve places orders.
Model fallback chain — when a model degenerates or rate-limits, the chat loop auto-rolls to the next in the chain. No silent failures, no token-loop hallucinations.
Audit logs are append-only and persist across restarts: data/audit/*.jsonl, data/traces/spans.jsonl.

Compliance copy

Legal & license

⚠ No-advice disclaimer — NVTrader is a research tool, not a registered investment advisor. Past backtest performance does not predict future returns.
Terms of service · Privacy policy — placeholder copy partners should edit before deploying.
Apache 2.0 — fork, modify, redistribute (commercial use included). Retain attribution.
Third-party code in third_party/ retains its own license — do not relicense.

FAQ

Frequently asked

What's the difference between a strategy and a strategy?

Sleeve = strategy + capital allocation slot. Multi-strategy desks (Millennium, Citadel) carve their book into sleeves. In NVTrader the same 28 agents serve all sleeves; only the AlphaSource plugs differently. Configs live at configs/sleeves/*.yaml.

Is this real money?

No. WEBULL_PAPER=1 + LIVE_TRADING=0 means every order is dry-run against a paper account. Even with LIVE_TRADING=1, the broker is paper. You'd need to swap to a live Webull subscription + flip PAPER_TRADING=0 to trade real money.

How does cuFOLIO actually work?

Mean-CVaR portfolio optimization on the GB10 GPU. Generate 5,000-10,000 scenarios via KDE (~0.3s), feed into cuOpt PDLP solver (~0.3s). Output: portfolio weights that maximize expected return subject to a CVaR (tail-loss) constraint. See the NVIDIA blueprint.

Why is Kimi (Moonshot) the default chat model — not Nemotron?

Kimi is exceptionally clean at function-calling — it routes tool calls without reasoning-trace noise. Nemotron 3 Super 120B is a reasoning model — it's the right pick for analytical synthesis but slower. Both run via NVIDIA Build API (Moonshot is hosted in NVIDIA's catalog).

What's the AIQ DeepResearch engine?

NVIDIA's open Deep Researcher blueprint (planner → researcher → synthesizer → citer multi-phase pattern) wrapped to drive Nemotron 3 Super via LangChain. POST /api/chat with engine=aiq, or use scripts/deep_research.py.

Where's the actual NemoRL training loop?

M8 (Phase 2). Today's scripts/autoresearch.py handles the outer loop (candidate strategy search + save). The inner loop — RL fine-tuning of a Nemotron variant on agent traces + realized P&L — needs the NemoRL library install + Gym env build + hours of GPU training.

Can I add my own ETF / sector to the chat tools?

Yes — edit ETF_TOP_HOLDINGS or SECTOR_TICKERS in src/traderspace/api/chat_server.py. Restart with scripts/start_server.sh. The chat agent picks up the new options automatically.

How do I restart the server?

~/workspace/traderspace/scripts/start_server.sh — idempotent, kills the existing uvicorn, restarts on 127.0.0.1:8015, verifies port is bound.

How NVTrader works — and how to drive it.