Create an example configuration for [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) to train a 1B or 3B model with REINFORCE++ and the simple arithmetic reasoning-gym dataset. - [x] locally running example with transformers generate inference - [ ] ray based variant