NVTrader — Continuous RL · LLM post-training

Launch a training run

POST /api/nemo-rl/launch · routes via the NAT A2A bus through NeMoRLTrainingAgent

Base model (HF id)

Train data path (JSONL)

Max steps

Training runs — loading

loading runs from /api/nemo-rl/runs…

Select a run on the left

—

click a run to stream its log

NemoRL AutoResearch Karpathy pattern · meta-agent over NeMo-RL

idle

Meta-loop pattern from karpathy/autoresearch. Each iteration, Nemotron 3 Super proposes a typed config edit (KL penalty, batch size, learning rate, …); the loop launches the inner training via the NeMo-RL bridge, parses the eval metric, keeps or reverts. Click ⚙ Launch with AutoResearch above to start a session — the meta-agent picks the best config across `budget` iterations.

events: TrainNemoRLRequested NeMoRLTrainingStarted NeMoRLTrainingProgress NeMoRLTrainingComplete PM Agent + AuditAgent subscribe to all of these

[ DPO LOOP · PROMOTED POLICIES ] trained checkpoint → local vLLM → next narration call

vLLM checking…

[ active policy per decision_type ]

[ all registered policies ]

events: PolicyCandidateRegistered PolicyPromoted PolicyPromotionFailed LocalInferenceReloadStarted PolicyPromotionAgent → InferenceServerAgent → policy_router