[BUG] EnvBase.step_and_maybe_reset(td)
modifies the ('next','observation') data too on partial reset withNonTensorStack
#2257
Labels
bug
Something isn't working
Describe the bug
For a custom environment with NonTensorData calling
tensordict, tensordict_ = step_and_maybe_reset(tensordict)
changes both the (next, observation) entry of the inputtensordict
(unexpected), as well as the observation entry oftensordict_
which is partially been reset (expected).To Reproduce
This Environment is hard coded for batch_size = (2,).
The observation space is just a string for simplicity.
_step
always returns ["B", "Z"] as next observation, with the first batch entry being in a done state but not the second._reset
always returns ["A", "C"] as initial observation after reset.(The action is ignored and only included to comply with the spec)
Expected behavior
After taking one step, and executing
out_td, reset_td = env.step_and_maybe_reset(td)
we expect thattd
is unchanged, especiallytd["next","observation"]
andreset_td
having the observation being reset in the first dimension but not the second. Specifically, we expecttd["next","observation"]=["B","Z"]
andreset_td["observation"] = ["A","Z"]
.However, both
td["next","observation"]
andreset_td["observation"]
are both ["A", "Z"].System info
The library was installed using pip requirements. We use the nightly-release.
Additional context
The problem occurs only for partial resets (not all batch entries are done) and is likely correlated with pytorch/tensordict#837.
Interestingly using the latest releases (0.4.0 1.26.4 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] linux) I get another wrong result:
Checklist
The text was updated successfully, but these errors were encountered: