-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] I do not understand the GPU and memory usage of SB3 #1630
Comments
Hello, Might be a duplicate of #863 |
Printing my model size with this:
And calling it like: Returns: Is there any part of the agent that could be bigger? I am using my custom |
Also, whenever yoou run a batch od data on GPU ypu have to transfer those data to the cuda device, so the data are in GPU at some point, isn't it? |
I am still unable to figure out the problem in here nor in the #863 issue. There, the solution was to flatten the observation, but this does not explain anything |
@araffin I think that GPU usage could be a bit more optimal. First of all, debugging the PPO class (the train method) I found that the GPU usage is a bit confusing, if I keep every hyperparameter fixed (n_steps, batch_size, etc...) but I change the number of environments at the vectorized environment, the GPU usage differs from one another: 1 environment: 1815MiB I do not understand this as the Does this make any sense? Am I missing something? |
This should answer your question: stable-baselines3/stable_baselines3/common/buffers.py Lines 391 to 398 in aab5459
|
Yes, it does. Thanks a lot. Also, this opens up my ather question and it is that I think that the rollout buffer should not be on GPU and that the GPU usage should be controlled by the batch size at each epoch. Thus, you could collect a giant rollout_buffer but train on a small but fast GPU by choosing a correct batch_size. Isn't it? |
Sorry, I do not understand. If rollout buffer is always on cpu, why at #1630 (comment) the number of used environments improves the GPU usage? |
Indeed, if I debug PPO training at a GPU, I get this:
This should mean that the data of the rollout_buffer are alocated at GPU |
are you using subprocesses? if so, that might be due to the way python multiprocessing work.
if you look at the code (and you should), the device is only used here: stable-baselines3/stable_baselines3/common/buffers.py Lines 127 to 139 in aab5459
when sampling the data there:
|
I am creating the environment like this: gym_env = make_vec_env(make_env,
env_kwargs=env_kwargs,
n_envs=args.n_envs,
vec_env_cls=SubprocVecEnv) So I assume it uses some kind of multiprocessing, yes. What does this has to do with GPU usage? |
Hi again @araffin . I am still unable to figure out how, it the transition of data from RolloutBuffer is done at each sampling, how can the GPU usage be so big just when the code goes into train method, as this should not have any data on GPU, only the model. |
❓ Question
I think I do not underestand the memory usage of SB3. I have a Dict observation space of some huge matrixes, so my observation space is 17MB approx:
I training a
PPO
agent over a Vectorized environment with themake_vec_env
function atn_envs = 2
and the hyperparameters of myPPO
agent aren_steps = 6
and mybatch_size
is16
. If I underestood well, my rollout buffer will ben_steps x n_envs = 12
so the rollout_buffer will be17 x 12 = 204 MB
. I assume that thebatch_size
of16
will get the minimum so it is equivalent of having a batch size of12
.The problem here is that when I'm using a GPU device (80GB A100) it stabilizes at 70GB of usage at the beginning and a little bit later it stops for the lack of space at the device. How is this even possible?
Checklist
The text was updated successfully, but these errors were encountered: