GRPO algorithm doesn't need Critic model,why the codes need to initialize the Critic?
GRPO algorithm doesn't need Critic model,why the codes need to initialize the Critic?