Skip to content

utsimul/Ship_routing_QLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ship_routing_QLearning

So far, I have written the code to create the final Q matrix, after which we can iterate over it using nearest neighbour algorithms to find the final path. The disadvantage of nearest neighbour algorithms in which future rewards or anti-rewards are not considered, is resolved if we use the Q matrix.

Cases where this works:

When we have a straightforward path-finding case where there aren't any weird land obstacles, my agent reaches the goal state. This is the current situation after 1 episode: image

I have initialized the Q matrix with the weights arranged as per a linear space of values increasing toward the goal in an exponential manner. This proves to be an adavantage when, near the start coordinate, the gradients do not move that drastically (less slope in gradient vs coordinate graph), and the nearer it moves towards the goal state, the increase in weights becomes more and more.

image

Cases where this doesn't work (To rectify soon):

Where there are land obstacles, the agent find reasons to get stuck (more distance towards ocean vs hitting the land) and thus cannot reach the goal state.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published