You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear altruists, I am new at stable baselines and RL. I am trying to retrain my previously trained PPO1 model as like it will start learning from where it was left in the previous training. What I am trying to do is :
loading my previously trained model from my computer and then re-train it from the point it ended it’s last training. For that, I am loading my previously saved model inside policy_fn() and I am giving policy_fn as parameter inside pposgd_simple.learn() method. It shows error "ValueError: At least two variables have the same name: pi/obfilter/count"
Also, I am unsure of whether it starts the training from the previous ending point or whether it started the training from the very beginning (when it trains correctly in a different setting). Can anyone please help me directing the way to verify it. One option may be printing the model parameters, but I am unsure of it.
I am also trying to use Tensorboard to monitor my training. But when I run the training, the program says “tensorboard_log=logger_path, TypeError: learn() got an unexpected keyword argument 'tensorboard_log'.” My stable baselines version 2.10.2. I am attaching my entire code of training below. I would appreciate any suggestions from you. Thanks in advance.
def make_env(seed=None):
reward_scale = 1.0
rank = MPI.COMM_WORLD.Get_rank()
myseed = seed + 1000 * rank if seed is not None else None
set_global_seeds(myseed)
env = Env()
env = Monitor(env, logger_path, allow_early_resets=True)
env.seed(seed)
if reward_scale != 1.0:
from baselines.common.retro_wrappers import RewardScaler
env = RewardScaler(env, reward_scale)
return env
def train(num_timesteps, path=None):
from baselines.ppo1 import mlp_policy, pposgd_simple
sess = U.make_session(num_cpu=1)
sess.__enter__()
def policy_fn(name, ob_space, ac_space):
policy = mlp_policy.MlpPolicy(name=name, ob_space=ob_space, ac_space=ac_space,
hid_size=64, num_hid_layers=3)
saver = tf.train.Saver()
if path is not None:
print("Tried to restore from ", path)
U.initialize()
saver.restore(tf.get_default_session(), path)
saver2 = tf.train.import_meta_graph('/srcs/src/models/model1.meta')
model = saver.restore(sess,tf.train.latest_checkpoint('/srcs/src/models/'))
#return policy
return saver2
env = make_env()
pi = pposgd_simple.learn(env, policy_fn,
max_timesteps=num_timesteps,
timesteps_per_actorbatch=1024,
clip_param=0.2, entcoeff=0.0,
optim_epochs=10,
optim_stepsize=5e-5,
optim_batchsize=64,
gamma=0.99,
lam=0.95,
schedule='linear',
tensorboard_log=logger_path,
#tensorboard_log="./ppo1_tensorboard/",
)
env.env.plotSave()
saver = tf.train.Saver(tf.all_variables())
saver.save(sess, '/models/model1')
return pi
def main():
logger.configure()
path_ = "/models/model1"
train(num_timesteps=409600, path=path_)
if __name__ == '__main__':
rank = MPI.COMM_WORLD.Get_rank()
logger_path = None if logger.get_dir() is None else os.path.join(logger.get_dir(), str(rank))
main()
The text was updated successfully, but these errors were encountered:
Seems like you are confusing OpenAI baselines with stable-baselines. In stable-baselines, you can save and restore models with simple agent.save and PPO.load functions. Stable-baselines does not have support for loading OpenAI baselines agents with a single call.
Also, we recommend using stable-baselines3 as it is more actively supported.
Dear altruists, I am new at stable baselines and RL. I am trying to retrain my previously trained PPO1 model as like it will start learning from where it was left in the previous training. What I am trying to do is :
Also, I am unsure of whether it starts the training from the previous ending point or whether it started the training from the very beginning (when it trains correctly in a different setting). Can anyone please help me directing the way to verify it. One option may be printing the model parameters, but I am unsure of it.
The text was updated successfully, but these errors were encountered: