GRPO_Finetuning Finetuning a base LLM with custom GRPO trainer to answer questions in RIchard Feymann style based on first principles.