Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for help, cannot reproduce the results of learning curves on NL Texas Hold'em trained by DQN #224

Closed
jiahui-x opened this issue Jun 15, 2021 · 1 comment

Comments

@jiahui-x
Copy link

Hi, I am struggling to get a good training result on NL Texas Hold'em by DQN. I follow your paper to choose hyperparameters as: the memory size is selected in {2000, 100000}, the discount factor is set to 0.99, Adam optimizer is applied with learning rate 0.00005, and the network structure is MLP with size 10-10 128-128, 512-512 or 512-1024-2048-1024-512(I have tried them all). But can only get a bad results like:
s
And the result in your paper like:(with the same amount of training timesteps but got much more rewards)
image

@daochenzha
Copy link
Member

@jiahui-x Hi, thanks for the feedback. This result in the paper is out-of-date. This is possibly due to multiple factors. First, the environment codebase has some major updates since the first release, particularly the reward has been divided by 2. Second, the current implementation is based on Torch instead of TensorFlow. Thus, your result seems reasonable to me.

The codebase is expected to remain stable in the near future to ensure reproducibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants