Skip to content

LucasSilvaFerreira/BOMBERMAN-Reinforcement-Learning-Q-Learning

Repository files navigation

Q-Learning_BOMBERMAN

Using Reinforcment Q-learning to teach an AI agent how to play from scratch a simple BomberMan game clone

This is a simple Q-learning algorythm in python to teach an AI how to play a generic bomberman clone.

The bomberman enviroment and its render were created in pure python.

The sprites were created by the user VOXEL and released at http://ludumdare.com/compo/2015/05/05/minild-59-swapshop/ gamejam.

Enviroment Rules.

  • A bomb takes 3 turns to explode.
  • A bomb always explode in a cross shaped area (x+1, x-1, y+1 and y-1)
  • A bomb destroy wall blocks.
  • A bomb kill the Agent case it is in the area.
  • The max number of turns to finish the game is 100.

Q-learning function

(source: https://blog.goodaudience.com/deep-q-learning-a-reinforcement-learning-algorithm-d1a93b754535)


REWARD function

 def get_reward(self):
    if (self.value_after) > 0:
        r =  (self.value_after**2) * (1 - (self.Turn / float(100) )) 
    else:    
        r = -1 * (self.Turn / float(100))
    return r

self.value_after: Is a delta generated using the number of blocks reamaning in the scenario before and after the agent perform a given action.
self.Turn: Current turn.
float(100): Total Number of turns.


The reward is bigger case the agent destroy more blocks in a early game phase. (This reward function forces the Agent to focused in destroy blocks, instead to spams bombs randomly in empty spaces)

If the agent doesn't destroy any block it will be penalized (negative reward) in a time manner (Not destroying blocks will become more expensive each turn).


Result after 0 episodes .


Result after 20000 episodes .


Result after 980000 episodes (The convergence was a lot early but I posted the last episode).

About

Using Q-learning to teach an AI agent play a simple BomberMan clone.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published