Based on PARL, the GA3C algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Atari benchmarks.
Original paper: GA3C: GPU-based A3C for Deep Reinforcement Learning
A hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm.
Please see here to know more about Atari games.
Results with one learner (in a P40 GPU) and 24 simulators (in 12 CPU) in 10 million sample steps.
- paddlepaddle>=1.5.1
- parl
- gym==0.12.1
- atari-py==0.1.7
At first, We can start a local cluster with 24 CPUs:
xparl start --port 8010 --cpu_num 24
Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our documentation
Then we can start the distributed training by running:
python train.py
[Tips] The performance can be influenced dramatically in a slower computational environment, especially when training with low-speed CPUs. It may be caused by the policy-lag problem.