Call for help, cannot reproduce the results of learning curves on NL Texas Hold'em trained by DQN #224

jiahui-x · 2021-06-15T08:01:32Z

Hi, I am struggling to get a good training result on NL Texas Hold'em by DQN. I follow your paper to choose hyperparameters as: the memory size is selected in {2000, 100000}, the discount factor is set to 0.99, Adam optimizer is applied with learning rate 0.00005, and the network structure is MLP with size 10-10 128-128, 512-512 or 512-1024-2048-1024-512(I have tried them all). But can only get a bad results like:

And the result in your paper like:(with the same amount of training timesteps but got much more rewards)

daochenzha · 2021-06-15T18:37:14Z

@jiahui-x Hi, thanks for the feedback. This result in the paper is out-of-date. This is possibly due to multiple factors. First, the environment codebase has some major updates since the first release, particularly the reward has been divided by 2. Second, the current implementation is based on Torch instead of TensorFlow. Thus, your result seems reasonable to me.

The codebase is expected to remain stable in the near future to ensure reproducibility.

jiahui-x closed this as completed Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call for help, cannot reproduce the results of learning curves on NL Texas Hold'em trained by DQN #224

Call for help, cannot reproduce the results of learning curves on NL Texas Hold'em trained by DQN #224

jiahui-x commented Jun 15, 2021

daochenzha commented Jun 15, 2021

Call for help, cannot reproduce the results of learning curves on NL Texas Hold'em trained by DQN #224

Call for help, cannot reproduce the results of learning curves on NL Texas Hold'em trained by DQN #224

Comments

jiahui-x commented Jun 15, 2021

daochenzha commented Jun 15, 2021