Create example for use of reasoning-gym with OpenRLHF

Create an example configuration for [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) to train a 1B or 3B model with REINFORCE++ and the simple arithmetic reasoning-gym dataset. 

- [x] locally running example with transformers generate inference
- [ ] ray based variant