Release a loaded model #303

sammyf · 2023-10-08T08:16:21Z

sammyf
Oct 8, 2023

How can I release a model and free up memory before loading a new one?

I tried model.cleanup() but that doesn't seem to do anything, in terms of VRAM.

Oct 8, 2023

Python is a managed language, so there's nothing you need to do to free up the model other than removing all references to it. In fact there's nothing else you can do, since del only destroys the reference, not the object being referenced.

del model is largely equivalent to model = None. Neither will free any memory used by the model unless model is the last remaining reference. Even then the garbage collector and PyTorch's CUDA cache can take a little while to catch up, so you may not see memory become available right away (and in some cases you may not actually have the memory to allocate from, even though it should be available in theory.)

You can force garbage collection with:

import gc…

View full answer

sammyf · 2023-10-08T09:53:17Z

sammyf
Oct 8, 2023
Author

okay ... no idea if that's the right way to do it, but it seems to work. I'll write the answer here, in case someone else is looking for it:

given a model previously loaded like this :

model = ExLlama(config) # create ExLlama instance and load the weights

this seems to free up everything :

            ExLlama.free_unmanaged(model)
            del model
            del cache
            del tokenizer
            del generator
            print("model released.")

0 replies

turboderp · 2023-10-08T12:37:36Z

turboderp
Oct 8, 2023
Maintainer

Python is a managed language, so there's nothing you need to do to free up the model other than removing all references to it. In fact there's nothing else you can do, since del only destroys the reference, not the object being referenced.

del model is largely equivalent to model = None. Neither will free any memory used by the model unless model is the last remaining reference. Even then the garbage collector and PyTorch's CUDA cache can take a little while to catch up, so you may not see memory become available right away (and in some cases you may not actually have the memory to allocate from, even though it should be available in theory.)

You can force garbage collection with:

import gc, torch

...

gc.collect()
torch.cuda.empty_cache()

This takes a little while to complete (some milliseconds) so you won't want to call it constantly, but between unloading a model and loading the next one it should be fine. Just remember that any (indirect) reference to the model forces the garbage collector to retain it and anything else that it references in turn.

1 reply

sammyf Oct 8, 2023
Author

thanks! that helped!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Release a loaded model #303

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Release a loaded model #303

Uh oh!

sammyf Oct 8, 2023

Replies: 2 comments · 1 reply

Uh oh!

sammyf Oct 8, 2023 Author

Uh oh!

turboderp Oct 8, 2023 Maintainer

Uh oh!

sammyf Oct 8, 2023 Author

sammyf
Oct 8, 2023

Replies: 2 comments 1 reply

sammyf
Oct 8, 2023
Author

turboderp
Oct 8, 2023
Maintainer

sammyf Oct 8, 2023
Author