You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,bro!
I am sorry to bother you again.
For the terminal state,you use two different versions: if terminal: # in this case, the terminal reward will be assigned as r_batch[-1] R_batch[-1] = r_batch[-1] # terminal state
and if terminal: R_batch[-1, 0] = 0 # terminal state.
I think the first one in line with my understanding.Why the R of terminal state equal to zero in your tf version and vanilla-Pensieve?
Thank your very much!
The text was updated successfully, but these errors were encountered:
Hi,bro!
I am sorry to bother you again.
For the terminal state,you use two different versions:
if terminal: # in this case, the terminal reward will be assigned as r_batch[-1] R_batch[-1] = r_batch[-1] # terminal state
and
if terminal: R_batch[-1, 0] = 0 # terminal state
.I think the first one in line with my understanding.Why the R of terminal state equal to zero in your tf version and vanilla-Pensieve?
Thank your very much!
The text was updated successfully, but these errors were encountered: