Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac M1+WASM: many_buttons extreme memory usage #18257

Open
Azorlogh opened this issue Mar 11, 2025 · 4 comments
Open

Mac M1+WASM: many_buttons extreme memory usage #18257

Azorlogh opened this issue Mar 11, 2025 · 4 comments
Labels
A-Rendering Drawing game state to the screen A-UI Graphical user interfaces, styles, layouts, and widgets C-Bug An unexpected or incorrect behavior C-Machine-Specific This bug is isolated to specific hardware or driver configurations C-Performance A change motivated by improving speed, memory usage or compile times O-MacOS Specific to the MacOS (Apple) desktop operating system O-Web Specific to web (WASM) builds O-WebGL2 Specific to the WebGL2 render API S-Needs-Design This issue requires design work to think about how it would best be accomplished

Comments

@Azorlogh
Copy link
Contributor

Azorlogh commented Mar 11, 2025

Bevy version

Tested on master 4f6241178fa75263fb1fa961874f843684dd7b9a, as well as 0.15.3, and 0.14.2, 0.13.2, 0.12.1. (it does not occur on 0.11.3)

Relevant system information

  • Mac mini M1 w/ Sequoia 15.2.
  • MacBook Pro M1 w/ Sequoia 15.3.1
  • Chrome 134.0.6998.89, Safari 18.2 [20620.1.16.11.8]
AdapterInfo { name: "ANGLE (Apple, ANGLE Metal Renderer: Apple M1, Unspecified Version)", vendor: 4203, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "WebGL 2.0 (OpenGL ES 3.0 Chromium)", backend: Gl }

What you did

Run the many_buttons example on chrome or safari: https://bevyengine.org/examples/stress-tests/many-buttons/

What went wrong

The memory usage of the Google Chrome Helper (GPU) shoots up to 10GB and webgl2 crashes for the entire browser, until the browser is restarted.

On Linux+x11+nvidia, it only took 260MiB VRAM and and 1GB RAM.

Additional information

  • Disabling text by setting args.no_text = true; fixes it.
  • Wgpu throws an error after a couple seconds:
wgpu error: Validation error

Caused by:
    In `Queue::submit`
        Not enough memory left.
  • I found this because a very similar issue is now naturally happening in my app after updating to bevy 0.15 :(
@Azorlogh Azorlogh added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Mar 11, 2025
@Azorlogh
Copy link
Contributor Author

Azorlogh commented Mar 11, 2025

I found two relevant commits using git bisect. Both quite old.

The first one made GPU memory usage go from 260MB (same as linux) up to 2GB.
4f1d9a6

The second one made it shoot up to the sky & crash webgl2. (apparently around 12.88 GB)
d70b4a3

They are both related to batching 🤔

@Azorlogh Azorlogh changed the title Mac M1+WASM: Many buttons extreme memory usage Mac M1+WASM: many_buttons extreme memory usage Mar 11, 2025
@Azorlogh
Copy link
Contributor Author

Azorlogh commented Mar 11, 2025

Additional findings:
I looked at the allocated video memory using Spector.js on chrome. It does not show a huge difference between linux+chrome and macos+chrome.
MacOS + chrome is just 87.1 MB
Image
Linux + chrome is 58.0 MB. Slightly less but nothing crazy.
Image

I noticed that in chrome, if I set the ANGLE backend to OpenGL instead of Metal, it works perfectly fine (the process takes 260MiB like linux).

Therefore I assume ANGLE's metal backend is leaking memory :/

@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times A-UI Graphical user interfaces, styles, layouts, and widgets O-Web Specific to web (WASM) builds O-MacOS Specific to the MacOS (Apple) desktop operating system S-Needs-Design This issue requires design work to think about how it would best be accomplished C-Machine-Specific This bug is isolated to specific hardware or driver configurations and removed S-Needs-Triage This issue needs to be labelled labels Mar 20, 2025
@alice-i-cecile
Copy link
Member

Really good investigative work on all fronts. That's super frustrating.

@ashivaram23
Copy link

ashivaram23 commented Mar 21, 2025

I'm also running into this, but from a different route so here's what I can tell so far.

Apparently, when you use the same index buffer for multiple draw calls in the same queue, ANGLE's Metal backend uploads one copy of the buffer to the GPU for each call. In other words, to experience this memory explosion you need the combination of two conditions:

  1. The same index buffer used across multiple draw calls
  2. Using ANGLE's Metal backend

My project meets the first condition because of a separate phenomenon (bug? result of using dynamic uniform buffers instead of vertex buffers to store instance data and limiting them to 4 KB, meaning you have to split instances into groups that fit) where bevy/wgpu on WebGL splits what should be a single instanced draw into several smaller ones. I have 40,000 cubes in the scene, and they all render in one draw call on native and WebGPU, but on WebGL there are thousands of draw calls each covering ~25 cubes. All still use the same index buffer, and ANGLE makes thousands of identical copies of it. The index buffer is only 64 KiB though (the cube needs 36 indices, and the rest is presumably for padding/alignment), so the total memory impact isn't on the order of gigabytes.

This case with many_buttons meets the first condition as well, although I'm not as certain about the details. Anyway I think the timeline is

  • Originally all buttons rendered in one draw call (not instanced, just everything in one giant vertex buffer), so there was no memory problem.
  • Then the first commit identified with git bisect, 4f1d9a6, made each button need a separate draw call. Index buffers weren't being used at that point, but the ANGLE translation still generates its own index buffers for each draw call. These are smaller (padded/aligned up to 64 KiB), and that's what caused the initial increase in memory.
  • Between the first and second bisected commits, the many_buttons example was changed to add images to every 4th button. That brought the intended number of batches from 1 to ~6000, more suitable for a stress test, but the bug from 4f1d9a6 wasn't fixed yet so it was still doing a lot more draw calls than that.
  • In the second bisected commit, d70b4a3, batching was fixed so that multiple items could group together again. The UI drawing code still didn't use index buffers (that came later in e7a31d0), but for some reason the ANGLE-generated index buffers got much larger anyway, around the same size that a single combined index buffer would take. I have no idea why this happens but it explains the memory explosion showing up here and not after e7a31d0.
  • e7a31d0 changed the rendering code to draw UI elements with draw_indexed, using one large index buffer split across the draw calls for each batch. This leads to ANGLE making copies for each draw call, just as it does for my project that uses instanced cubes. With ~6000 batches and a ~2 MB index buffer, the example occupies around 12 GB of GPU memory.

Update/clarification: the ANGLE quirks mentioned above (making copies of index buffers, generating index buffers even though drawArrays was used, etc) are probably because of the GLSL flat interpolation qualifier. Bevy uses this (@interpolate(flat) in WGSL) in the regular PBR shaders to make sure the rasterizer doesn't interpolate instance indices, and also in the UI shaders for several other values. Relevant ANGLE source code links: creating shaders for generating index buffers, shaders, criteria that triggers it

The reason why the ANGLE-generated buffers suddenly got larger at d70b4a3 may be related to the buffer allocation logic here where it seems like it has to use the same size for all the buffers in the pool, and that gets stuck at the largest size allocated so far.

To fix this, bevy would need to either not use @interpolate(flat) or ensure the vertex used as the reference for all three vertices is the first one with the WEBGL_provoking_vertex extension. But that may not have widespread browser support, and I don't think wgpu implements it.

@alice-i-cecile alice-i-cecile added the O-WebGL2 Specific to the WebGL2 render API label Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen A-UI Graphical user interfaces, styles, layouts, and widgets C-Bug An unexpected or incorrect behavior C-Machine-Specific This bug is isolated to specific hardware or driver configurations C-Performance A change motivated by improving speed, memory usage or compile times O-MacOS Specific to the MacOS (Apple) desktop operating system O-Web Specific to web (WASM) builds O-WebGL2 Specific to the WebGL2 render API S-Needs-Design This issue requires design work to think about how it would best be accomplished
Projects
None yet
Development

No branches or pull requests

3 participants