-
Notifications
You must be signed in to change notification settings - Fork 3
changes for latest versions of BenchMARL #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -143,7 +149,9 @@ def _forward( | |||
else: # Gather outputs for one agent on the obs | |||
# tensor of shape [*batch, n_agents, n_actions], where the outputs | |||
# along the n_agent dimension are taken with the same (agent_index) agent network | |||
agent_out = self.agent_mlps.agent_networks[agent_index].forward(input) | |||
# agent_out = self.agent_mlps.agent_networks[agent_index].forward(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key change as agent_networks is no longer supported in torchRL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a mil, just a few qs
@@ -13,7 +13,7 @@ use_action_loss: True | |||
action_loss_lr: 0.00003 | |||
|
|||
experiment: | |||
max_n_frames: 5_000_000 | |||
max_n_frames: 1_000_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not change the default config for reproducibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, about that I didnt want to run for longer, so pushed this by mistake.
distance = self.estimate_snd(input) | ||
if update_estimate: | ||
self.estimated_snd[:] = distance.detach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expalin this a bit?
If those conditions are met, we can avoid computing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to be able to log the estimated_snd during training when the desired snd is -1
. Right now it logs Nan's. It was just to be able to see the evolution of snd while training as well. Can be remove if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but you can still see it under eval/snd no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, But if I understand that is only during evaluation right? This was helpful in understanding how the SND evolves while training. But you are right eval/snd is enough. Should we roll back to the previous version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok got it, i ll take care of things don't worry
A few adaptations for DiCo to work with the latest BenchMARL codebase. No need to install the specific branches of tensordict, torchRL, and BenchMARL. Let me know if additional testing is required.
SND curves for the sampling environment


Final learned policy
https://github.com/user-attachments/assets/4ba5ae75-02d2-4d80-868b-03bd538af34a