A reinforcement learning project that trains an AI agent to evade a chasing opponent in a 2D tag game environment. The evader agent learns to survive as long as possible against various chaser strategies using Proximal Policy Optimization (PPO) with curriculum learning.
The project features:
- PPO-based training using Stable Baselines3
- Curriculum learning against multiple deterministic chaser policies
- Parallel environment processing for efficient training
- Comprehensive evaluation tools with visualization support
Install dependencies:
pip install -r requirements.txtRun the training script from the project root:
python environments/taggame/train.py [OPTIONS]| Option | Description | Default |
|---|---|---|
--log-dir |
Output directory | Auto-generated timestamp |
--timesteps |
Total training timesteps | 1,000,000 |
--n-envs |
Number of parallel environments | 8 |
# Basic training
python environments/taggame/train.py
# Custom training run
python environments/taggame/train.py --timesteps 500000 --n-envs 16
# Specify output directory
python environments/taggame/train.py --log-dir data/taggame/my_experiment --timesteps 2000000Models are saved to data/taggame/train_<TIMESTAMP>/:
best_model.zip- Best performing model during trainingfinal.zip- Final model after training completesppo_taggame_*_steps.zip- Checkpoints every 50k stepstensorboard/- TensorBoard logs for monitoringtraining.log- Training log file
View training progress with TensorBoard:
tensorboard --logdir data/taggame/train_<TIMESTAMP>/tensorboardRun the evaluation script:
python environments/taggame/evaluate.py [OPTIONS]| Option | Description | Default |
|---|---|---|
--model |
Path to model file | Auto-finds best model |
--policies |
Policy indices (e.g., "0,1,7") or "all" | all |
--episodes |
Episodes per policy | 10 |
--max-steps |
Max steps per episode | 1000 |
--render |
Enable visual rendering | False |
--fps |
FPS limit for rendering | 60 |
--deterministic-evader |
Use rule-based evader instead of RL model | False |
# Evaluate against all chaser policies
python environments/taggame/evaluate.py --policies all --episodes 10
# Evaluate specific policies with rendering
python environments/taggame/evaluate.py --policies 0,2,5 --render --fps 60
# Evaluate a specific model
python environments/taggame/evaluate.py --model data/taggame/train_20251223/best_model.zip
# Compare against deterministic baseline
python environments/taggame/evaluate.py --deterministic-evader --policies allThe evader trains against these deterministic chaser strategies:
| Index | Policy | Description |
|---|---|---|
| 0 | DirectChasePolicy | Simple direct pursuit toward evader |
| 1 | InterceptChasePolicy | Predicts and intercepts evader position |
| 2 | CornerCutPolicy | Cuts off corners to trap evader |
| 3 | ZigzagChasePolicy | Zigzag movement pattern |
| 4 | SpiralChasePolicy | Spiral chase pattern |
| 5 | RandomWalkPolicy | Random movement |
| 6 | AmbushPolicy | Waits for opportunity to strike |
| 7 | ChaoticChasePolicy | Unpredictable aggressive pursuit |
| 8 | HumanLikePolicy | Human-like chase behavior |
Key settings are in environments/taggame/config.py:
WIDTH = 900 # Game window width
HEIGHT = 800 # Game window height
PLAYER_RADIUS = 10 # Player collision radius
MAX_VELOCITY = 100 # Maximum player velocity
MAX_EPISODE_STEPS = 1000 # Episode length limitPolicy weights control the curriculum learning distribution. Higher weights mean the policy is used more frequently during training:
POLICY_WEIGHTS = [
54.9, # DirectChasePolicy
11.8, # InterceptChasePolicy
54.4, # CornerCutPolicy
20.4, # ZigzagChasePolicy
5.0, # SpiralChasePolicy
72.2, # RandomWalkPolicy
7.9, # AmbushPolicy
42.7 # ChaoticChasePolicy
]ai-taggame/
├── README.md
├── requirements.txt
├── rl.py # Custom DQN implementation (legacy)
├── mdp.py # Abstract MDP base class
├── util.py # Utility functions
└── environments/
└── taggame/
├── config.py # Configuration & hyperparameters
├── taggame.py # Core game environment
├── gym_wrapper.py # Gymnasium wrapper for PPO
├── train.py # PPO training script
├── evaluate.py # Evaluation script
├── tag_player.py # Player entity class
├── static_info.py # Helper classes
└── deterministic_policies/
├── __init__.py
├── direct_chase.py
├── intercept_chase.py
├── corner_cut.py
├── zigzag_chase.py
├── spiral_chase.py
├── random_walk.py
├── ambush.py
├── chaotic_chase.py
├── human_like.py
└── evader_policy.py