Using Reinforcment Q-learning to teach an AI agent how to play from scratch a simple BomberMan game clone
This is a simple Q-learning algorythm in python to teach an AI how to play a generic bomberman clone.
The bomberman enviroment and its render were created in pure python.
The sprites were created by the user VOXEL and released at http://ludumdare.com/compo/2015/05/05/minild-59-swapshop/ gamejam.
- A bomb takes 3 turns to explode.
- A bomb always explode in a cross shaped area (x+1, x-1, y+1 and y-1)
- A bomb destroy wall blocks.
- A bomb kill the Agent case it is in the area.
- The max number of turns to finish the game is 100.
(source: https://blog.goodaudience.com/deep-q-learning-a-reinforcement-learning-algorithm-d1a93b754535)
def get_reward(self):
if (self.value_after) > 0:
r = (self.value_after**2) * (1 - (self.Turn / float(100) ))
else:
r = -1 * (self.Turn / float(100))
return r
self.value_after: Is a delta generated using the number of blocks reamaning in the scenario before and after the agent perform a given action.
self.Turn: Current turn.
float(100): Total Number of turns.
The reward is bigger case the agent destroy more blocks in a early game phase. (This reward function forces the Agent to focused in destroy blocks, instead to spams bombs randomly in empty spaces)
If the agent doesn't destroy any block it will be penalized (negative reward) in a time manner (Not destroying blocks will become more expensive each turn).
Result after 0 episodes .
Result after 20000 episodes .
Result after 980000 episodes (The convergence was a lot early but I posted the last episode).