World model (work in progress PR for early commenting)#3
World model (work in progress PR for early commenting)#3
Conversation
|
I have decided to base WorldModel on Env after all and have step() sample from transition_distribution() by default. |
|
please ignore what's under rl/mdp/ for now and only look at world_model/ |
| def __init__(self): | ||
| super().__init__() | ||
|
|
||
| def reset(self, seed=None, state = None): |
There was a problem hiding this comment.
Second argument should be called options, it's probably not a great idea to change that. Not sure what exactly you're going for by changing this, but I don't think defining reset for the abstract MDP is necessary
There was a problem hiding this comment.
OK, fixed the order. It is necessary though since the point is that an MDPEnv can reset to a particular state, which a generic Env cannot.
| All observations are full states, so the environment can be reset to an arbitrary state by providing the respective observation. | ||
| """ | ||
|
|
||
| def __init__(self): |
There was a problem hiding this comment.
Redundant, plus might cause side-effects (what if an actual env takes arguments in the constructor?)
There was a problem hiding this comment.
I'm never sure about these things. So you say derived classes can just leave out __init__? Assuming this is the message, I have now removed all empty __init__s from all these classes.
|
|
||
| history = None | ||
| """The history of the current episode.""" | ||
|
|
| from numpy.random import choice | ||
| from gymnasium import Env, ResetNeeded | ||
|
|
||
| # TODO: add typing |
There was a problem hiding this comment.
Please do, I'm a type-driven creature and it's pretty hard for me to read code without types
| from typing import Any | ||
| from gymnasium import Env | ||
|
|
||
| class MDP(Env): |
There was a problem hiding this comment.
Maybe a different structure would be better - MDP is just an abc.ABC, doesn't necessarily inherit from Env. It should only contain the state method. Then it effectively functions as a trait/type class, i.e. when something inherits from MDP, it will only check that it contains state and that's it.
There was a problem hiding this comment.
the point here is that an MDP can reset to a particular state, which a normal Env cannot. I have now split this into two classes, MDP with only state() and MDPEnv, which adds reset(..., state=...).
| class WorldModel(Env): | ||
| """An abstract base class for potentially probabilistic world models, | ||
| extending gymnasion.Env by providing methods for enquiring transition probabilities. | ||
| Most implementations will probably combine this with a gymnasium.Env. |
There was a problem hiding this comment.
Thing to consider regarding the class hierarchy.
If it is the case that "most implementations" will combine this with a gym.Env, then WorldModel shouldn't inherit from Env.
If we want it to be required that every WorldModel "combines" this with gym.Env, then the current inheritance might make sense
|
I'll pull out the discussion regarding the As I understand, the idea is that you can set the environment to a certain state. It might be sorta doable via modifying the I see two ways we could do this:
|
|
@martinkunev This can be merged now I believe. Please test the gridworlds I made, using |
No description provided.