P-PAD

Reinforcement learning on Puzzle & Dragons.

The project is still evolving but currently we are trying:

Boltzmann policy and annealing.
Experience replay.
A frozen model B for action reward prediction.

One combo agent

Feb 2nd, 2019: Our agent now makes one combo then passes.