[ agent · 8 · Feedback ]

NemoRLAutoResearchOrchestrator

Karpathy-pattern meta-loop over NeMo-RL. Each iteration, Nemotron 3 Super proposes a typed config edit (KL penalty, learning rate, batch size); the loop launches the inner DPO/GRPO/SFT run via the NeMo-RL bridge, parses the eval metric, keeps or reverts.

← Back to roster Model card · nvidia/nemotron-3-super-120b-a12b ↗