A question about R_batch[-1,0] #35

lwz339 · 2025-04-06T09:04:03Z

Hi,bro!
I am sorry to bother you again.
For the terminal state,you use two different versions:
if terminal: # in this case, the terminal reward will be assigned as r_batch[-1] R_batch[-1] = r_batch[-1] # terminal state
and
if terminal: R_batch[-1, 0] = 0 # terminal state.
I think the first one in line with my understanding.Why the R of terminal state equal to zero in your tf version and vanilla-Pensieve?
Thank your very much!

The text was updated successfully, but these errors were encountered:

lwz339 closed this as completed Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about R_batch[-1,0] #35

A question about R_batch[-1,0] #35

lwz339 commented Apr 6, 2025

A question about R_batch[-1,0] #35

A question about R_batch[-1,0] #35

Comments

lwz339 commented Apr 6, 2025