This work explores strategies to adapt single-player reinforcement learning algorithms for competitive play in two-player adversarial games such as poker. To effectively learn to play against a competitive opponent in the absence of an expert, we experiment with different strategies such as training against a random policy, adversarial training and self-play. We conduct extensive experimentation to test effectiveness of each strategy and summarize our insights into training agents for optimal performance in competitive multiplayer environments.
Player 1 | Player 2 | Strategy | Win-rate | Reward (1000 Games) | Winner |
---|---|---|---|---|---|
DQN_baseline | DQN_Independent | Cross Competition | 26%:71%:1% | 1329.5 | DQN_Independent |
DQN_Self_Play_classic | DQN_Self_Play_improved | Cross Competition | 40%:58%:2% | 489 | DQN_Self_Play_2 |
DQN_Independent | DQN_shared | Cross Competition | 55%:41%:4% | 464.5 | DQN_Independent |
DQN_Self_Play_2 | DQN_Independent | Cross Competition | 58%:40%:2% | 462 | DQN_Self_Play_2 |
Player 1 | Player 2 | Strategy | Win-rate | Reward (1000 Games) | Winner |
---|---|---|---|---|---|
DQN Agent | Random Agent | DQN_Baseline | 81%:19%:0% | 1191.5 | DQN_baseline |
DQN Agent | DQN Agent | Independent Learning | 51%:47%:2% | 141.5 | DQN_Independent |
DQN Agent | DQN Agent | Shared Learning | 49%:49%:2% | 101 | DQN_shared |
DQN Agent | DQN Agent | Self Play classic | 50%:48%:2% | 64.5 | DQN_self_play_1 |
DQN Agent | DQN Agent | Self Play improved | 51%:47%:3% | 157.5 | DQN_self_play_2 |
DQN_v_baseline_Texas.mp4
PettingZoo[classic,butterfly]>=1.24.0
Pillow>=9.4.0
ray[rllib]==2.7.0
SuperSuit>=3.9.0
torch>=1.13.1
tensorflow-probability>=0.19.0