-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reliable way to access true underlying Markov state #3
Comments
That was me! And yes - that could be useful to compare against a fully-observed "optimum". Do some environments already return this in the |
Hello @smorad, |
That would be fantastic, thanks @raphaelavalos! For most of the envs, the state is in an underlying There are also some decisions to make on what exactly the state is. For example, in Battleship, is the I'd argue the first case of Battleship and Higher Lower |
Hi both, I'd be happy to help too, though I won't have time until Christmas at the earliest. One thought from my side in the mean time: It might be useful to optionally return the state as part of the observation, either instead of or even together with the partial observation in a |
I considered something similar, but messing with the |
Oh, no, I mean purely optionally, with the option set when the environment is instantiated. By default return just the partial observation as-is, but have an option in |
For the environments done before this issue I was doing it with a |
Hey !
Currently, I have added a method class Autoencode(gym.Env):
...
def get_state(obs):
state = ...
return state I was thinking about creating a class IncludeState(enum.IntEnum):
NO = 0
INFO = 1
DICT = 2
class PopgymEnv(gym.Env):
def __init__(self, include_state: IncludeState = IncludeState.NO):
super().__init__()
self.state_space = None
|
Another possibility is to create two wrappers As I had to modify the code of all the environments, the PR will also contain some small modifications on the original code such as:
@smorad would that work for you? |
Fantastic, thanks again for all the hard work! I think both the class IncludeMState(enum.IntEnum): # State is an overloaded term, maybe MState == MarkovState?
NO = 0
IN_INFO_DICT = 1
IN_OBSERVATION = 2 or maybe even class Observability(enum.IntEnum):
PARTIAL = 0
FULL_IN_INFO_DICT = 1
FULL = 2 # Maybe we should have an option for returning JUST the Markov state as an observation. This could be the upper bound for what we want a partially observable agent to achieve.
FULL_AND_PARTIAL = 3 # I think this is closer to your idea of `Dict({"mstate": state, "obs": obs})` Modifications sound great. I am considering deprecating the maze environments anyways, because the paper shows they are actually a pretty bad benchmark for memory. I don't want to mislead users into thinking their memory is working when the task can be solved by an MLP. The only suggestion I have is with the third point. Most of the environments where I include the previous action are very difficult/impossible to learn without observing the previous action. We should probably edit the documentation to make it clear that environments like |
Hey, |
A user has requested a method to access the true underlying state for each environment. We should return an info dict
{state: ...}
fromenv.step
for all envs.The text was updated successfully, but these errors were encountered: