-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about result for Pick Environment #15
Comments
Hello! For the pick environment, the state space is quite large due to the small scale of the PSM gripper and the object. I generated demonstrations of grasping by driving the tool above the object, grasping it, and then moving it to the goal. These demonstrations are then used by augmenting the policy loss in DDPG with a behavioral cloning loss. This is similar to https://arxiv.org/pdf/1709.10089.pdf |
Hello! Thanks for the reply, I read the paper you mentioned. I wonder how you generate the demonstrations data, just guide the arm in vrep and collect the data? |
The grasping task can be put into 3 steps:
Hope this helps! |
I mean...from baseline#474, he generates the data by a script. Can I also get the data similar to this way? |
Sorry for the delayed response. I had to look around. I found the generated data and have attached it in the .zip file. Also the code I wrote is posted below:
|
Thanks for your help; |
During training, I’ve only seen that when trying other rewards functions. |
Your discussions above helped me a lot. Thank you! I'm very confused about these problems and wishing for your reply. |
@leey127 |
Do you have any suggestions about this problem? @bango123 |
I would confirm the code I shared in the previous comment is able to solve the environment. Basically hand-craft a policy to solve it to confirm there are no other issues. |
Hello, I have some questions about the results about the environment of pick_and_place.
I used ddpg+her to train the agent, but get bad result(success rate=0), I read your paper, you said you use BC(behavior cloning), could you give some hint or some reference to get a good result?
The text was updated successfully, but these errors were encountered: