use numpy to run dqn, and employ back propagation and adam algorithm. because the process of training dqn will be unstable, may be value explosion. These hyperparameter will be sensitive. if you have any problem, just raise an issue. reference https://zhuanlan.zhihu.com/p/381987920