[ agent · 8b · DPO closure ]
InferenceServerAgent
I manage the local vLLM inference server. On every PolicyPromoted I stop the running vLLM (if any) and start a fresh one against the new checkpoint on localhost:8024 with the OpenAI-compatible API. During cold-start the policy_router falls back to cloud Nemotron automatically.