[ agent · 8 · Feedback ]
NeMoRLFeedbackAgent
I count preference pairs from PreferenceLearningAgent (approve/reject on rebalances + narrations). Once enough accumulate, I emit TrainNemoRLRequested(algo='dpo') so the Nemotron policy gets a fresh DPO retrain on the latest user feedback.
← Back to roster
No LLM (pure compute)