A bot that plays the game 2048.
It consists of two small neural networks for the playing polcy and value function, which are optimized using Proximal Policy Optimization (PPO).
The bot's average number of moves per game per batch of training.After training for a few hours on a CPU, the bot is able to get good enough to sometimes beat the game by building the 2048 block.
main.pyis a script to train the bot and periodically write checkpoints.demo.pyallows you to view games from the bot. You can use it to see that the bot beats the game by building the 2048 block!game.pyis just the game itself.
