Question about backprop #20

edwhu · 2024-10-28T17:56:53Z

Hi Nicklas,

I am looking into this and the FOWM codebase, and have a few questions.

Q1: Backprop through time?

First, in the TDMPC codebase, it seems like you detach all the zs in the latent rollout. Doesn't this stop the backprop through time? If I do Q(z_3), the loss for Q(z_3) will not affect z_2, z_1.

tdmpc/src/algorithm/tdmpc.py

Lines 194 to 200 in f4d85ec

    
           Q1, Q2 = self.model.Q(z, action[t]) 
        
           z, reward_pred = self.model.next(z, action[t]) 
        
           with torch.no_grad(): 
        
           	next_obs = self.aug(next_obses[t]) 
        
           	next_z = self.model_target.h(next_obs) 
        
           	td_target = self._td_target(next_obs, reward[t]) 
        
           zs.append(z.detach())

In the FOWM codebase, it stores all the predicted zs into an array without detaching, and then predicts Q values for all the predicted zs.
https://github.com/fyhMer/fowm/blob/bf7985876eb4baa827f41567cb6fb47b5da93ed4/src/algorithm/tdmpc.py#L283-L299

Q2: Why enable / disable tracking of Q gradients during update_pi?

If we are using the pi_optim to optimize, that would only update the policy parameters. So why do we need to disable / enable the Q network grads? Does that somehow make things more efficient?

tdmpc/src/algorithm/tdmpc.py

Lines 153 to 169 in f4d85ec

    
           def update_pi(self, zs): 
        
           	"""Update policy using a sequence of latent states.""" 
        
           	self.pi_optim.zero_grad(set_to_none=True) 
        
           	self.model.track_q_grad(False) 
        
           	# Loss is a weighted sum of Q-values 
        
           	pi_loss = 0 
        
           	for t,z in enumerate(zs): 
        
           		a = self.model.pi(z, self.cfg.min_std) 
        
           		Q = torch.min(*self.model.Q(z, a)) 
        
           		pi_loss += -Q.mean() * (self.cfg.rho ** t) 
        
           	pi_loss.backward() 
        
           	torch.nn.utils.clip_grad_norm_(self.model._pi.parameters(), self.cfg.grad_clip_norm, error_if_nonfinite=False) 
        
           	self.pi_optim.step() 
        
           	self.model.track_q_grad(True) 
        
           	return pi_loss.item()

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about backprop #20

Question about backprop #20

edwhu commented Oct 28, 2024

Question about backprop #20

Question about backprop #20

Comments

edwhu commented Oct 28, 2024