[benchmark] train / val / test — methodologically honest idle
t
NVTrader
v0.1.18
benchmark

Strategy benchmarking — temporal train / val / test

Fits hyperparameters on train, picks the winner on val, reports the held-out test number once and never tunes against it. Large val→test drops surface as a val-test gap warning.
[ result ]
— no run yet — configure on the left and hit ▸ Run benchmark —
[ folds ]
train (fit)
val (model select)
test (held-out)
[ winning combo ]
[ full metrics · test fold ]
[ daily P&L · test fold ]
[ monthly returns · test fold ]
[ position timeline · test fold ]
▸ trade log · rows · test fold
[ realized compare ] backtest · paper · live
[ train / val matrix ] — sorted by val score (best first)
▸ raw run payload (debug)
[ live session ]
— type an intent on the left and hit ▸ Discover —
[ verdict ]
— pick a sleeve on the left and hit ▸ Run validation —
checks passed
need ≥ 7
confidence score
weighted mean
promote to paper
7-of-10 gate
narration
[ PM Agent assessment ] ⟳ Nemotron 3 Super narrating…
[ check results ]
[ tier 2 · decision replay ]
— "what did my actual approvals / rejects earn?"
Reads every Approve / Reject / Override from the audit log, marks each one to market over its stated horizon, and builds a shadow equity curve from the approvals plus per-intent attribution buckets grouped by (direction · quant_model · strategy). Aggregate metrics get an N-threshold warning so we never overclaim on thin samples.
[ post-trade analysis ] — drift signals + PM Agent (Nemotron 3 Super) recommendation
Answers "is the strategy still behaving like the version we validated?" Pick a deployed version below — the platform computes drift on Sharpe / CAGR / drawdown / slippage / signal frequency, scrapes risk-limit-trip events from autotrader sessions, and chooses one of hold · reduce_allocation · pause · re_run_validation · retrain_model · change_parameters · retire_strategy.
[ recommendation ]
[ PM Agent narrative ] ⟳ Nemotron 3 Super narrating…
[ drift signals ]
[ risk-limit events ]