Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Windows Committed Memory (Virtual Memory) #7563

Open
RyanJDick opened this issue Jan 16, 2025 · 0 comments
Open

High Windows Committed Memory (Virtual Memory) #7563

RyanJDick opened this issue Jan 16, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@RyanJDick
Copy link
Collaborator

RyanJDick commented Jan 16, 2025

The way that Windows handles virtual memory reservations causes some problems for Invoke as we try to squeeze the most out of a system’s available RAM and VRAM limits. This became increasingly evident in v5.6.0rc2, so I have taken the time to document the issue for reference. There will be a handful of PRs coming soon to mitigate the issues.

Background: Windows page file

Background reading:

Key takeaways:

  • “Committed Memory” == “Virtual Memory”
  • If the pagefile is “system-managed” it will grow up to 3 times the physical memory, but no more than 1/8 of the disk volume size. (This assumes that there is enough disk space to support the pagefile.)

On Windows all reserved virtual memory space (aka ‘committed memory’) must be backed by a physical location (either physical memory or a page file). This differs from Linux, which allows virtual memory reservations to have no physical backing if they are never used.

Symptoms

On Windows, you could run out of page file space for the following reasons:

  • The page file is set to a fixed size rather than “system-managed”, so does not grow dynamically.
  • The page file has hit the upper size limit of the “system-managed” policy.
  • There is insufficient disk space for the page file to grow any larger.

When you run out of page file space, you may see a variety of different crashes. The most common are:

  • Invoke exits with windows error code 3221225477
  • Invoke crashes without an error, but eventvwr.msc reveals an error with code 0xc0000005 (the hex equivalent of 3221225477).

Why is virtual memory usage high in Invoke?

Invoke does two things that result in high virtual memory usage:

  • Allocate CUDA tensors using torch
  • Load models from memory-mapped files using safetensors

torch CUDA Tensors

All CUDA memory requires virtual address space. Because of Windows’ policy that all virtual address space has a physical backing, this means that we must reserve physical disk space in the page file for all allocated CUDA memory, even though it will never be used! To put it differently: if we load a 24 GB model onto the GPU, Windows will reserve a 24GB virtual memory ‘placeholder’ in the page file (on disk) even though it will never be accessed!

Here is a minimal reproduction if you wish to see this in action:

  1. Open your Windows Task Manager and check the current “Committed” memory.
Image
  1. Allocate a 10GB CUDA tensor:
PS C:\Users\Ryan\ryan_src\InvokeAI> python
Python 3.11.11 (main, Dec 19 2024, 14:36:07) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.rand([1, 10 * 2 ** 30 // 4], device="cuda")
  1. Check the updated “Committed” memory.
Image
  1. Note that the “In use” memory has remained roughly unchanged, but the “Committed” memory (Virtual Memory) has grown by ~10GB.

safetensors mmap

When we load model files using safetensors (e.g. safetensors.torch.load_file(...)), it uses a memory-mapped file so that tensors can be lazy-loaded as needed from the file. In Windows, the mmap’ed file requires virtual address space, so once again Windows will unnecessarily reserve this space in the page file.

Here’s a minimal reproduction:

  1. Open your Windows Task Manager and check the current “Committed” memory.
Image
  1. Load a safetensors file from disk. In this example, I am loading a ~22GB FLUX model.
PS C:\Users\Ryan\ryan_src\InvokeAI> python
Python 3.11.11 (main, Dec 19 2024, 14:36:07) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from safetensors.torch import load_file
>>> sd = load_file("C:\\Users\\Ryan\\ryan_invoke\\models\\flux\\main\\FLUX Dev.safetensors")
  1. Check the updated memory stats. At this point in time, Windows has reserved virtual address space for the mmap’ed file, but no weights have actually been loaded into memory. The “Committed” memory has grown by ~22GB, and the “In use” memory has stayed roughly the same.
Image
  1. Now, suppose that we try to do something with the mmap-ed state dict. The weights will be materialized in memory, but the mmap’ed file won’t be released until all references to it are gone. At the peak, we will have 2x the model size reserved in virtual memory!
>>> for k in sd.keys():
...     sd[k] = sd[k] * 2
@RyanJDick RyanJDick added the bug Something isn't working label Jan 16, 2025
@RyanJDick RyanJDick changed the title High Windows Committed Memory High Windows Committed Memory (Virtual Memory) Jan 16, 2025
RyanJDick added a commit that referenced this issue Jan 16, 2025
## Summary

Prior to this change, there were several cases where we initialized the
weights of a FLUX model before loading its state dict (and, to make
things worse, in some cases the weights were in float32). This PR fixes
a handful of these cases. (I think I found all instances for the FLUX
family of models.)

## Related Issues / Discussions

- Helps with #7563

## QA Instructions

I tested that that model loading still works and that there is no
virtual memory reservation on model initialization for the following
models:
- [x] FLUX VAE
- [x] Full T5 Encoder
- [x] Full FLUX checkpoint
- [x] GGUF FLUX checkpoint

## Merge Plan

No special instructions.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
RyanJDick added a commit that referenced this issue Jan 17, 2025
## Summary

This PR adds a `keep_ram_copy_of_weights` config option the default (and
legacy) behavior is `true`. The tradeoffs for this setting are as
follows:
- `keep_ram_copy_of_weights: true`: Faster model switching and LoRA
patching.
- `keep_ram_copy_of_weights: false`: Lower average RAM load (may not
help significantly with peak RAM).

## Related Issues / Discussions

- Helps with #7563
- The Low-VRAM docs are updated to include this feature in
#7566

## QA Instructions

- Test with `enable_partial_load: false` and `keep_ram_copy_of_weights:
false`.
  - [x] RAM usage when model is loaded is reduced.
  - [x] Model loading / unloading works as expected.
  - [x] LoRA patching still works.
- Test with `enable_partial_load: false` and `keep_ram_copy_of_weights:
true`.
  - [x] Behavior should be unchanged.
- Test with `enable_partial_load: true` and `keep_ram_copy_of_weights:
false`.
  - [x] RAM usage when model is loaded is reduced.
  - [x] Model loading / unloading works as expected.
  - [x] LoRA patching still works.
- Test with `enable_partial_load: true` and `keep_ram_copy_of_weights:
true`.
  - [x] Behavior should be unchanged.

- [x] Smoke test CPU-only and MPS with default configs.

## Merge Plan

- [x] Merge #7564 first and
change target branch.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
RyanJDick added a commit that referenced this issue Jan 17, 2025
## Summary

This PR revises the logic for calculating the model cache RAM limit. See
the code for thorough documentation of the change.

The updated logic is more conservative in the amount of RAM that it will
use. This will likely be a better default for more users. Of course,
users can still choose to set a more aggressive limit by overriding the
logic with `max_cache_ram_gb`.

## Related Issues / Discussions

- Should help with #7563

## QA Instructions

Exercise all heuristics:
- [x] Heuristic 1
- [x] Heuristic 2
- [x] Heuristic 3
- [x] Heuristic 4

## Merge Plan

- [x] Merge #7565 first and
update the target branch

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant