High Windows Committed Memory (Virtual Memory) #7563

RyanJDick · 2025-01-16T22:11:21Z

The way that Windows handles virtual memory reservations causes some problems for Invoke as we try to squeeze the most out of a system’s available RAM and VRAM limits. This became increasingly evident in v5.6.0rc2, so I have taken the time to document the issue for reference. There will be a handful of PRs coming soon to mitigate the issues.

Background: Windows page file

Background reading:

Key takeaways:

“Committed Memory” == “Virtual Memory”
If the pagefile is “system-managed” it will grow up to 3 times the physical memory, but no more than 1/8 of the disk volume size. (This assumes that there is enough disk space to support the pagefile.)

On Windows all reserved virtual memory space (aka ‘committed memory’) must be backed by a physical location (either physical memory or a page file). This differs from Linux, which allows virtual memory reservations to have no physical backing if they are never used.

Symptoms

On Windows, you could run out of page file space for the following reasons:

The page file is set to a fixed size rather than “system-managed”, so does not grow dynamically.
The page file has hit the upper size limit of the “system-managed” policy.
There is insufficient disk space for the page file to grow any larger.

When you run out of page file space, you may see a variety of different crashes. The most common are:

Invoke exits with windows error code 3221225477
Invoke crashes without an error, but eventvwr.msc reveals an error with code 0xc0000005 (the hex equivalent of 3221225477).

Why is virtual memory usage high in Invoke?

Invoke does two things that result in high virtual memory usage:

Allocate CUDA tensors using torch
Load models from memory-mapped files using safetensors

`torch` CUDA Tensors

All CUDA memory requires virtual address space. Because of Windows’ policy that all virtual address space has a physical backing, this means that we must reserve physical disk space in the page file for all allocated CUDA memory, even though it will never be used! To put it differently: if we load a 24 GB model onto the GPU, Windows will reserve a 24GB virtual memory ‘placeholder’ in the page file (on disk) even though it will never be accessed!

Here is a minimal reproduction if you wish to see this in action:

Open your Windows Task Manager and check the current “Committed” memory.

Allocate a 10GB CUDA tensor:

PS C:\Users\Ryan\ryan_src\InvokeAI> python
Python 3.11.11 (main, Dec 19 2024, 14:36:07) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.rand([1, 10 * 2 ** 30 // 4], device="cuda")

Check the updated “Committed” memory.

Note that the “In use” memory has remained roughly unchanged, but the “Committed” memory (Virtual Memory) has grown by ~10GB.

`safetensors` mmap

When we load model files using safetensors (e.g. safetensors.torch.load_file(...)), it uses a memory-mapped file so that tensors can be lazy-loaded as needed from the file. In Windows, the mmap’ed file requires virtual address space, so once again Windows will unnecessarily reserve this space in the page file.

Here’s a minimal reproduction:

Open your Windows Task Manager and check the current “Committed” memory.

Load a safetensors file from disk. In this example, I am loading a ~22GB FLUX model.

PS C:\Users\Ryan\ryan_src\InvokeAI> python
Python 3.11.11 (main, Dec 19 2024, 14:36:07) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from safetensors.torch import load_file
>>> sd = load_file("C:\\Users\\Ryan\\ryan_invoke\\models\\flux\\main\\FLUX Dev.safetensors")

Check the updated memory stats. At this point in time, Windows has reserved virtual address space for the mmap’ed file, but no weights have actually been loaded into memory. The “Committed” memory has grown by ~22GB, and the “In use” memory has stayed roughly the same.

Now, suppose that we try to do something with the mmap-ed state dict. The weights will be materialized in memory, but the mmap’ed file won’t be released until all references to it are gone. At the peak, we will have 2x the model size reserved in virtual memory!

>>> for k in sd.keys():
...     sd[k] = sd[k] * 2

The text was updated successfully, but these errors were encountered:

## Summary Prior to this change, there were several cases where we initialized the weights of a FLUX model before loading its state dict (and, to make things worse, in some cases the weights were in float32). This PR fixes a handful of these cases. (I think I found all instances for the FLUX family of models.) ## Related Issues / Discussions - Helps with #7563 ## QA Instructions I tested that that model loading still works and that there is no virtual memory reservation on model initialization for the following models: - [x] FLUX VAE - [x] Full T5 Encoder - [x] Full FLUX checkpoint - [x] GGUF FLUX checkpoint ## Merge Plan No special instructions. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_

## Summary This PR adds a `keep_ram_copy_of_weights` config option the default (and legacy) behavior is `true`. The tradeoffs for this setting are as follows: - `keep_ram_copy_of_weights: true`: Faster model switching and LoRA patching. - `keep_ram_copy_of_weights: false`: Lower average RAM load (may not help significantly with peak RAM). ## Related Issues / Discussions - Helps with #7563 - The Low-VRAM docs are updated to include this feature in #7566 ## QA Instructions - Test with `enable_partial_load: false` and `keep_ram_copy_of_weights: false`. - [x] RAM usage when model is loaded is reduced. - [x] Model loading / unloading works as expected. - [x] LoRA patching still works. - Test with `enable_partial_load: false` and `keep_ram_copy_of_weights: true`. - [x] Behavior should be unchanged. - Test with `enable_partial_load: true` and `keep_ram_copy_of_weights: false`. - [x] RAM usage when model is loaded is reduced. - [x] Model loading / unloading works as expected. - [x] LoRA patching still works. - Test with `enable_partial_load: true` and `keep_ram_copy_of_weights: true`. - [x] Behavior should be unchanged. - [x] Smoke test CPU-only and MPS with default configs. ## Merge Plan - [x] Merge #7564 first and change target branch. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [ ] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_

## Summary This PR revises the logic for calculating the model cache RAM limit. See the code for thorough documentation of the change. The updated logic is more conservative in the amount of RAM that it will use. This will likely be a better default for more users. Of course, users can still choose to set a more aggressive limit by overriding the logic with `max_cache_ram_gb`. ## Related Issues / Discussions - Should help with #7563 ## QA Instructions Exercise all heuristics: - [x] Heuristic 1 - [x] Heuristic 2 - [x] Heuristic 3 - [x] Heuristic 4 ## Merge Plan - [x] Merge #7565 first and update the target branch ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_

RyanJDick added the bug Something isn't working label Jan 16, 2025

RyanJDick changed the title ~~High Windows Committed Memory~~ High Windows Committed Memory (Virtual Memory) Jan 16, 2025

This was referenced Jan 16, 2025

Reduce peak memory during FLUX model load #7564

Merged

Add keep_ram_copy_of_weights config option #7565

Merged

Revise the default logic for the model cache RAM limit #7566

Merged

RyanJDick mentioned this issue Jan 17, 2025

Add troubleshooting docs for the Windows page file issue #7569

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Windows Committed Memory (Virtual Memory) #7563

High Windows Committed Memory (Virtual Memory) #7563

RyanJDick commented Jan 16, 2025 •

edited

Loading

High Windows Committed Memory (Virtual Memory) #7563

High Windows Committed Memory (Virtual Memory) #7563

Comments

RyanJDick commented Jan 16, 2025 • edited Loading

Background: Windows page file

Symptoms

Why is virtual memory usage high in Invoke?

torch CUDA Tensors

safetensors mmap

RyanJDick commented Jan 16, 2025 •

edited

Loading

`torch` CUDA Tensors

`safetensors` mmap