This repository contains a learning based control pipeline for the F1TENTH autonomous racing platform. The project combines online reinforcement learning using Proximal Policy Optimisation (PPO) with an offline diffusion based policy trained on expert demonstrations.
The system is evaluated in the F1TENTH ROS2 simulator and supports direct comparison between PPO and diffusion policies on unseen tracks.
The repository is organised into two main components.
ROS side scripts:
- ros_f110_env.py
- train_ros_f110_ppo.py
- run_ros_f110_ppo.py
- logger_node.py
- diffusion_runner.py
Diffusion training scripts:
- dataset.py
- diffusion_model.py
- train_diffusion.py
- Ubuntu 20.04
- ROS2 Galactic
- Python 3.8
- PyTorch
- Stable Baselines3
- NumPy
A dedicated Python virtual environment is recommended for all machine learning dependencies.
Ensure that the F1TENTH ROS2 simulator is installed and built in a workspace.
Source the ROS environment:
source /opt/ros/galactic/setup.zsh
source <PATH_TO_SIM_WS>/install/setup.zsh
Launch the simulator:
ros2 launch f1tenth_gym_ros gym_bridge_launch.py
The simulator publishes the following topics used by this project:
- /scan
- /odom
- /drive
PPO is trained online using a Gym style environment that interfaces directly with ROS topics.
Training command:
source <PATH_TO_VENV>/bin/activate
cd <PATH_TO_REPO>/ros_f110_ppo
python3 ros_f110_ppo/train_ros_f110_ppo.py
Training progress, rewards and losses are logged using TensorBoard.
To visualise training:
tensorboard --logdir <PATH_TO_TENSORBOARD_LOGS>
To run a trained PPO policy in the simulator:
source <PATH_TO_VENV>/bin/activate
cd <PATH_TO_REPO>/ros_f110_ppo
python3 ros_f110_ppo/run_ros_f110_ppo.py
Runtime data including steering, speed and minimum LiDAR distance is logged to CSV for analysis.
Expert demonstrations are recorded from the simulator for diffusion training.
source <PATH_TO_VENV>/bin/activate
cd <PATH_TO_REPO>/ros_f110_ppo
python3 ros_f110_ppo/logger_node.py
This produces a compressed NumPy dataset containing observations, actions and timestamps.
The diffusion policy is trained offline using recorded expert demonstrations.
source <PATH_TO_VENV>/bin/activate
cd <PATH_TO_REPO>/diffusion_f110
python3 train_diffusion.py
The model learns to predict noise added to expert actions conditioned on LiDAR based observations.
Training loss is logged using TensorBoard.
The trained diffusion policy is deployed in the simulator using a pure pursuit controller to convert predicted curvature signals into steering commands.
source <PATH_TO_VENV>/bin/activate
cd <PATH_TO_REPO>/ros_f110_ppo
python3 ros_f110_ppo/diffusion_runner.py
Runtime metrics are logged to CSV for direct comparison with PPO.
The following metrics are used to compare PPO and diffusion policies on unseen tracks:
- Steering angle over time
- Speed over time
- Minimum LiDAR distance
- Steering and speed distributions
Trajectory behaviour is assessed qualitatively using RViz.
- Logged datasets and plots are generated at runtime
- All paths use placeholders and must be adapted locally
This project is licensed under the MIT License. See the LICENSE file for details.