List view
- Read more into policy gradients; e.g. TRPO paper - Perhaps do the Berkeley Homework on implementing basic policy gradients. - Implement more advanced versions of policy gradients?
Overdue by 5 year(s)•Due by August 5, 2020•6/8 issues closedLearn the SOTA methods for DQN - Dueling DQN - Double DQN - DQfD (Deep Q Learning from Demonstrations) Learn about Inverse RL and how we can use it as a replacement for manual subrewards from ForgER
Overdue by 5 year(s)•Due by August 19, 2020•3/7 issues closed