Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDPG implementation #65

Open
dynamik1703 opened this issue Dec 18, 2017 · 4 comments
Open

DDPG implementation #65

dynamik1703 opened this issue Dec 18, 2017 · 4 comments

Comments

@dynamik1703
Copy link

Hey VinF,

thanks for your work!

I have questions about the DDPG implementation in deer.

Patrick Emami recommends in http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html to use for the actor and critic two functions in separate classes.

Additionally, he adds the action tensor in the 2nd hidden layer of the Critic Network.

Is my assumption correct that the ddpg implementation in deer is different?

King regards,
Roman

@VinF
Copy link
Owner

VinF commented Dec 18, 2017

Hi Roman,

In deer, the actor and the critic are two different objects (two different neural networks):
the actor is instanciated here and the actor here. But you are right, they indeed come from the same class : the way it's implemented uses the arguments of the class to make the two objects different (e.g., either the action is provided as input or not). Of course, it would be possible to build two different classes instead.

The place where the actions are given as input in the NN depends on the architecture you decide for your neural network. So indeed you can decide to provide them deeper in the NN. It's a design architecture but it does not really change anything in the logic of the algorithm.

Best,
Vincent

@dynamik1703
Copy link
Author

That's true.

In the ddpg paper the author used furthermore different learning rates for the actor and critic. The final layer for the actor is a 'tanh'.

Do you think it is worth to add this to deer?

@VinF
Copy link
Owner

VinF commented Dec 18, 2017

Yes, that may possibly be easier for the end user. So we would have two classes: a NN_actor and a NN_critic. For the Q-network, we would then use the NN_critic class. If you want to suggest a PR, we can look into it.

@dynamik1703
Copy link
Author

I will work out a proposal.

Best,
Roman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants