-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Windows Committed Memory (Virtual Memory) #7563
Labels
bug
Something isn't working
Comments
RyanJDick
changed the title
High Windows Committed Memory
High Windows Committed Memory (Virtual Memory)
Jan 16, 2025
This was referenced Jan 16, 2025
RyanJDick
added a commit
that referenced
this issue
Jan 16, 2025
## Summary Prior to this change, there were several cases where we initialized the weights of a FLUX model before loading its state dict (and, to make things worse, in some cases the weights were in float32). This PR fixes a handful of these cases. (I think I found all instances for the FLUX family of models.) ## Related Issues / Discussions - Helps with #7563 ## QA Instructions I tested that that model loading still works and that there is no virtual memory reservation on model initialization for the following models: - [x] FLUX VAE - [x] Full T5 Encoder - [x] Full FLUX checkpoint - [x] GGUF FLUX checkpoint ## Merge Plan No special instructions. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_
RyanJDick
added a commit
that referenced
this issue
Jan 17, 2025
## Summary This PR adds a `keep_ram_copy_of_weights` config option the default (and legacy) behavior is `true`. The tradeoffs for this setting are as follows: - `keep_ram_copy_of_weights: true`: Faster model switching and LoRA patching. - `keep_ram_copy_of_weights: false`: Lower average RAM load (may not help significantly with peak RAM). ## Related Issues / Discussions - Helps with #7563 - The Low-VRAM docs are updated to include this feature in #7566 ## QA Instructions - Test with `enable_partial_load: false` and `keep_ram_copy_of_weights: false`. - [x] RAM usage when model is loaded is reduced. - [x] Model loading / unloading works as expected. - [x] LoRA patching still works. - Test with `enable_partial_load: false` and `keep_ram_copy_of_weights: true`. - [x] Behavior should be unchanged. - Test with `enable_partial_load: true` and `keep_ram_copy_of_weights: false`. - [x] RAM usage when model is loaded is reduced. - [x] Model loading / unloading works as expected. - [x] LoRA patching still works. - Test with `enable_partial_load: true` and `keep_ram_copy_of_weights: true`. - [x] Behavior should be unchanged. - [x] Smoke test CPU-only and MPS with default configs. ## Merge Plan - [x] Merge #7564 first and change target branch. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [ ] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_
RyanJDick
added a commit
that referenced
this issue
Jan 17, 2025
## Summary This PR revises the logic for calculating the model cache RAM limit. See the code for thorough documentation of the change. The updated logic is more conservative in the amount of RAM that it will use. This will likely be a better default for more users. Of course, users can still choose to set a more aggressive limit by overriding the logic with `max_cache_ram_gb`. ## Related Issues / Discussions - Should help with #7563 ## QA Instructions Exercise all heuristics: - [x] Heuristic 1 - [x] Heuristic 2 - [x] Heuristic 3 - [x] Heuristic 4 ## Merge Plan - [x] Merge #7565 first and update the target branch ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The way that Windows handles virtual memory reservations causes some problems for Invoke as we try to squeeze the most out of a system’s available RAM and VRAM limits. This became increasingly evident in
v5.6.0rc2
, so I have taken the time to document the issue for reference. There will be a handful of PRs coming soon to mitigate the issues.Background: Windows page file
Background reading:
Key takeaways:
On Windows all reserved virtual memory space (aka ‘committed memory’) must be backed by a physical location (either physical memory or a page file). This differs from Linux, which allows virtual memory reservations to have no physical backing if they are never used.
Symptoms
On Windows, you could run out of page file space for the following reasons:
When you run out of page file space, you may see a variety of different crashes. The most common are:
3221225477
eventvwr.msc
reveals an error with code0xc0000005
(the hex equivalent of3221225477
).Why is virtual memory usage high in Invoke?
Invoke does two things that result in high virtual memory usage:
torch
safetensors
torch
CUDA TensorsAll CUDA memory requires virtual address space. Because of Windows’ policy that all virtual address space has a physical backing, this means that we must reserve physical disk space in the page file for all allocated CUDA memory, even though it will never be used! To put it differently: if we load a 24 GB model onto the GPU, Windows will reserve a 24GB virtual memory ‘placeholder’ in the page file (on disk) even though it will never be accessed!
Here is a minimal reproduction if you wish to see this in action:
safetensors
mmapWhen we load model files using safetensors (e.g.
safetensors.torch.load_file(...)
), it uses a memory-mapped file so that tensors can be lazy-loaded as needed from the file. In Windows, the mmap’ed file requires virtual address space, so once again Windows will unnecessarily reserve this space in the page file.Here’s a minimal reproduction:
The text was updated successfully, but these errors were encountered: