[ agent · 8 · Feedback ]
NemoRLAutoResearchOrchestrator
Karpathy-pattern meta-loop over NeMo-RL. Each iteration, Nemotron 3 Super proposes a typed config edit (KL penalty, learning rate, batch size); the loop launches the inner DPO/GRPO/SFT run via the NeMo-RL bridge, parses the eval metric, keeps or reverts.