You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking into this and the FOWM codebase, and have a few questions.
Q1: Backprop through time?
First, in the TDMPC codebase, it seems like you detach all the zs in the latent rollout. Doesn't this stop the backprop through time? If I do Q(z_3), the loss for Q(z_3) will not affect z_2, z_1.
Q2: Why enable / disable tracking of Q gradients during update_pi?
If we are using the pi_optim to optimize, that would only update the policy parameters. So why do we need to disable / enable the Q network grads? Does that somehow make things more efficient?
Hi Nicklas,
I am looking into this and the FOWM codebase, and have a few questions.
Q1: Backprop through time?
First, in the TDMPC codebase, it seems like you detach all the zs in the latent rollout. Doesn't this stop the backprop through time? If I do Q(z_3), the loss for Q(z_3) will not affect z_2, z_1.
tdmpc/src/algorithm/tdmpc.py
Lines 194 to 200 in f4d85ec
In the FOWM codebase, it stores all the predicted zs into an array without detaching, and then predicts Q values for all the predicted zs.
https://github.com/fyhMer/fowm/blob/bf7985876eb4baa827f41567cb6fb47b5da93ed4/src/algorithm/tdmpc.py#L283-L299
Q2: Why enable / disable tracking of Q gradients during update_pi?
If we are using the
pi_optim
to optimize, that would only update the policy parameters. So why do we need to disable / enable the Q network grads? Does that somehow make things more efficient?tdmpc/src/algorithm/tdmpc.py
Lines 153 to 169 in f4d85ec
The text was updated successfully, but these errors were encountered: