A Q-learning agent that learns to navigate a taxi in a 5×5 grid world, pick up passengers, and drop them at their destination.
Based on the classic Taxi-v3 environment. The taxi must:
- Navigate a 5×5 grid with walls
- Pick up a passenger from one of 4 locations (R, G, Y, B)
- Drop them off at another location
- Do this as efficiently as possible
Rewards:
- +20 for successful drop-off
- -1 for each step (encourages efficiency)
- -10 for illegal pickup/drop-off attempts
State space: 500 states (5×5 grid × 5 passenger locations × 4 destinations)
Actions: 6 (Up, Down, Left, Right, Pickup, Drop)
This project implements Q-learning, a model-free reinforcement learning algorithm. The agent learns a Q-table mapping state-action pairs to expected rewards through trial and error.
Q(s,a) ← Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]
Hyperparameters:
- Learning rate (α): 0.1
- Discount factor (γ): 0.99
- Exploration rate (ε): 0.1
- Episodes: 5000
After training, the agent learns the optimal policy and consistently solves the task in minimal steps.
├── main.c # Training loop, visualization, Q-learning
├── taxi.c # Environment logic (step, reset, rewards)
├── taxi.h # Environment struct and function declarations
Requires raylib for visualization.
-
Linux: raylib 5.5 is included in the repo — no installation needed.
-
For other systems: Install using package manager/download the source code of raylib an build manually.
# manually with gcc
gcc -o taxi main.c taxi.c -lraylib -lGL -lm -lpthread
# or Using the nob build system
# update the `nob.c` file with appropiate file path for a `nob` build.
./nob
MIT
