This project demonstrates the fundamentals of Reinforcement Learning (RL) by training an agent to play Blackjack, using Gymnasium (formerly OpenAI Gym).
The goal is to teach an agent to make optimal decisions — whether to “hit” or “stick” — to maximize its expected reward in a simulated Blackjack environment.
Blackjack is a simple card game where the player competes against the dealer.
This project implements a Monte Carlo Exploring Starts (MC-ES) algorithm to estimate value functions and derive an optimal strategy for the Blackjack-v1 environment.
The environment models the full game dynamics, including card draws, busts, wins, and losses. The agent learns entirely through simulation and feedback — no prior knowledge of Blackjack rules is required.
##Working (SS)
- Understand state-value and action-value functions (Q(s, a))
- Implement Monte Carlo Control using Exploring Starts
- Train an RL agent via episodic sampling
- Visualize learning progress with 3D plots and policy heatmaps
- Evaluate convergence and performance
| Component | Description |
|---|---|
| Language | Python 3.10 + |
| Core Library | Gymnasium |
| Computation & Plotting | NumPy, Matplotlib |
| Notebook Environment | Jupyter Notebook |
| RL Algorithm | Monte Carlo Exploring Starts (MC-ES) |
| Environment | Blackjack-v1 |
git clone https://github.com/AksaRose/Reinforcement_learning.git
cd Reinforcement_learningpython -m venv venv
source venv/bin/activateWindows:
python -m venv venv
venv\Scripts\activatepip install gymnasium numpy matplotlib jupyterjupyter notebook Blackjack.ipynb