traderspace — Benchmarking

[ configure run ]

sleeve_id

start

end

split ratios (train / val / test)

universe limit

benchmark

score by

[ hyperparameter grid ]

cvar_alpha n_scenarios max_position_pct rebal_freq (days) lookback_days

~ combos: 2

[ history ]

loading …

[ result ]

— no run yet — configure on the left and hit ▸ Run benchmark —

[ folds ]

train (fit)

—

val (model select)

—

test (held-out)

—

[ winning combo ]

—

[ full metrics · test fold ]

[ daily P&L · test fold ]

[ monthly returns · test fold ]

[ position timeline · test fold ]

▸ trade log · — rows · test fold

[ realized compare ] backtest · paper · live

[ train / val matrix ] — sorted by val score (best first)

—

▸ raw run payload (debug)

—

[ alpha discovery · llm-driven ]

Three NAT agents on the A2A bus iterate to find alpha: SignalGenerator → Code/Validator → Evaluator → Advisor. All routed through Nemotron 3 Super 120B; DPO-trained policies inherit each role.

intent universe — optional · leave empty and Nemotron picks a sector slice for you

signals / iter max iters

|IC| ≥ p-value ≤

[ resume prior session ]

Continues optimization from a previous run's last_feedback. Same intent, fresh iteration budget.

[ accepted signals ]

— none yet —

[ baseline benchmarks ] — deterministic sanity floor (5 fixed scorers)

The 5 hardcoded scorers (cross_sectional_pca, garch11_conditional_vol, ou_mean_reversion, realized_vol_target, vol_cone_percentile) remain available via POST /api/benchmark/signal/run for deterministic regression testing. Not surfaced as the primary UX — discovery is the operator path.

[ live session ]

— type an intent on the left and hit ▸ Discover —

—

[ verdict ]

— pick a sleeve on the left and hit ▸ Run validation —

checks passed

—

need ≥ 7

confidence score

—

weighted mean

promote to paper

—

7-of-10 gate

narration

—

[ PM Agent assessment ] ⟳ Nemotron 3 Super narrating…

[ check results ]

[ tier 2 · decision replay ]

— "what did my actual approvals / rejects earn?"

Reads every Approve / Reject / Override from the audit log, marks each one to market over its stated horizon, and builds a shadow equity curve from the approvals plus per-intent attribution buckets grouped by (direction · quant_model · strategy). Aggregate metrics get an N-threshold warning so we never overclaim on thin samples.

—

[ post-trade analysis ] — drift signals + PM Agent (Nemotron 3 Super) recommendation

Answers "is the strategy still behaving like the version we validated?" Pick a deployed version below — the platform computes drift on Sharpe / CAGR / drawdown / slippage / signal frequency, scrapes risk-limit-trip events from autotrader sessions, and chooses one of hold · reduce_allocation · pause · re_run_validation · retrain_model · change_parameters · retire_strategy.

[ recommendation ]

—

[ PM Agent narrative ] ⟳ Nemotron 3 Super narrating…

[ drift signals ]

[ risk-limit events ]

Strategy benchmarking — temporal train / val / test