Skip to content

WebGPURenderer: Optimize WebXR render path. #31134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

cabanier
Copy link
Contributor

@cabanier cabanier commented May 19, 2025

cc @Mugen87
This fixes some of the performance issues but since it reaches directly into the renderer, it seems brittle

Copy link

github-actions bot commented May 19, 2025

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 337.26
78.64
337.26
78.64
+0 B
+0 B
WebGPU 550.64
152.68
551.75
152.93
+1.11 kB
+245 B
WebGPU Nodes 549.99
152.53
551.1
152.77
+1.11 kB
+245 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 468.12
113.14
468.12
113.14
+0 B
+0 B
WebGPU 625.95
169.45
627.06
169.69
+1.11 kB
+238 B
WebGPU Nodes 580.8
158.77
581.91
159.01
+1.11 kB
+243 B

@cabanier
Copy link
Contributor Author

I traced the performance down to an issue in Quest browser. I'm investigating why I'm seeing this weird behavior.
I simplified the layers case to just the rollercoaster and am seeing these stage:

Surface 4 | 1000x1760 | color 32bit, depth 24bit, stencil 8 bit, MSAA 4, Mode: 2 (SwBinning) | 21 384x256 bins ( 22 rendered) | 0.48 ms | 64 stages : Render : 0.196ms StoreColor : 0.052ms Blit : 0.006ms StoreDepthStencil : 0.055ms
Surface 5 | 1000x1500 | color 32bit, depth 24bit, stencil 8 bit, MSAA 4, Mode: 2 (SwBinning) | 18 384x256 bins ( 19 rendered) | 0.44 ms | 55 stages : Render : 0.19ms StoreColor : 0.048ms Blit : 0.007ms StoreDepthStencil : 0.047ms
Surface 6 | 1000x1500 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 18 384x256 bins ( 18 rendered) | 0.74 ms | 38 stages : Binning : 0.175ms Render : 0.401ms StoreColor : 0.051ms Blit : 0.005ms
Surface 7 | 1000x1500 | color 32bit, depth 24bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 5 1056x352 bins ( 6 rendered) | 0.30 ms | 11 stages : Render : 0.255ms StoreColor : 0.015ms Blit : 0.004ms
Surface 8 | 1680x1500 | color 32bit, depth 24bit, stencil 8 bit, MSAA 4, Mode: 2 (SwBinning) | 27 192x512 bins ( 28 rendered) | 0.67 ms | 82 stages : Render : 0.309ms StoreColor : 0.074ms Blit : 0.005ms StoreDepthStencil : 0.072ms
Surface 9 | 1680x1760 | color 32bit, depth 24bit, stencil 8 bit, MSAA 4, Mode: 2 (SwBinning) | 35 384x256 bins ( 36 rendered) | 0.80 ms | 106 stages : Render : 0.328ms StoreColor : 0.085ms Blit : 0.006ms StoreDepthStencil : 0.085ms
Surface 10 | 1680x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 63 192x256 bins ( 63 rendered) | 2.00 ms | 128 stages : Binning : 0.177ms Render : 1.285ms StoreColor : 0.168ms Blit : 0.005ms
Surface 11 | 1680x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 2 (SwBinning) | 66 288x160 bins ( 41 rendered) | 1.42 ms | 81 stages : Render : 0.808ms StoreColor : 0.354ms Blit : 0.005ms

Surface 6 is the first render pass of the rollercoarst and 7 is the tone mapping
Surface 10 is the first pass of the eye buffer and 11 its tone mapping.

I don't know what is causing surface 4, 5, 8 and 9 to trigger but they're the root of the slow experience.

@cabanier cabanier force-pushed the frameBufferTarget branch from 3e64d3c to 77adfd8 Compare May 21, 2025 21:28
@cabanier cabanier force-pushed the frameBufferTarget branch from 77adfd8 to 5e9e053 Compare May 21, 2025 22:02
@cabanier
Copy link
Contributor Author

cabanier commented May 21, 2025

@Mugen87 I was able to fix the slowdown. Setting the renderer size also changed the pixel backing store of the canvas and that made it very slow. I also added code to discard depth to gain some performance.

I also added a simple test file to validate webxr without using multiview. That path was broken.

@cabanier cabanier marked this pull request as ready for review May 22, 2025 03:54
@cabanier
Copy link
Contributor Author

The windows failure also happens without my changes.See #31145

@@ -1250,6 +1252,7 @@ class Renderer {
frameBufferTarget.scissor.multiplyScalar( this._pixelRatio );
frameBufferTarget.scissorTest = this._scissorTest;
frameBufferTarget.multiview = outputRenderTarget !== null ? outputRenderTarget.multiview : false;
frameBufferTarget.resolveDepthBuffer = outputRenderTarget !== null ? outputRenderTarget.resolveDepthBuffer : true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indicates if we need to resolve the depth buffer. I think this needs to be updated for devices that prefer to get depth (ie AVP) but I have no way to test this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does AVP stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apple Vision Pro.
It works without a depth buffer but it has better reprojection if one is provided.

@@ -1380,7 +1383,7 @@ class Renderer {
renderContext.viewportValue.height >>= activeMipmapLevel;
renderContext.viewportValue.minDepth = minDepth;
renderContext.viewportValue.maxDepth = maxDepth;
renderContext.viewport = renderContext.viewportValue.equals( _screen ) === false;
renderContext.viewport = renderContext.viewportValue.equals( _screen ) === false || this._forceViewPort;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the viewport and screen are both set by setSize so we need to catch the instance we want them to be set explicitly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need _forceViewPort? Since you are not changing the drawing buffer when resizing the renderer during XRManager.renderLayers(), the configured viewport (which uses the layer's width and height) should be different than the drawing buffer size and thus set RenderContext.viewport to true.

RenderContext.viewport only indicates the viewport update should use the value from the render context (and not the drawing buffer size).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drawingBufferSize is calculated as follows:
return target.set( this._width * this._pixelRatio, this._height * this._pixelRatio ).floor();

so it also uses width and height. Maybe this is wrong? I was worried to change that it's a big change.

Copy link
Collaborator

@Mugen87 Mugen87 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of getDrawingBufferSize() is correct, imo. I had to read the code multiple times to understand why you need _forceViewPort.

I think you can remove it when the following lines:

state.viewport( 0, 0, gl.drawingBufferWidth, gl.drawingBufferHeight );

state.viewport( 0, 0, gl.drawingBufferWidth, gl.drawingBufferHeight );

are refactored to:

const { width, height } = this.getDrawingBufferSize( _drawingBufferSize );
state.viewport( 0, 0, width, height );

If you don't update the DOM element when setting the size for layers, gl.drawingBufferWidth and gl.drawingBufferHeight are incorrect. Using getDrawingBufferSize() should report the correct values.

this._renderer.setOutputRenderTarget( layer.renderTarget );
this._renderer.setRenderTarget( null );

renderer.setOutputRenderTarget( layer.renderTarget );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-\

const { frameBufferTarget, quad } = this._frameBufferTargets.get( layer.renderTarget ) || { frameBufferTarget: null, quad: null };
if ( ! frameBufferTarget ) {

renderer._quad = new QuadMesh( new NodeMaterial() );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed because otherwise there's confusion with multiview vr regular rendering.

@@ -904,7 +935,8 @@ class XRManager extends EventDispatcher {
const projectionlayerInit = {
colorFormat: gl.RGBA8,
depthFormat: glDepthFormat,
scaleFactor: this._framebufferScaleFactor
scaleFactor: this._framebufferScaleFactor,
clearOnAccess: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearOnAccess gives a small benefit because it skips the step where browser explicitly clear the buffer upon framebuffer attach.


const fb = renderTargetContextData.framebuffers[ renderContext.getCacheKey() ];
state.bindFramebuffer( gl.DRAW_FRAMEBUFFER, fb );
gl.invalidateFramebuffer( gl.DRAW_FRAMEBUFFER, renderTargetContextData.depthInvalidationArray );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this discards the depth buffer before we do the tonemapping steps. With this, the depth is flushed to main memory which adds overhead

@cabanier cabanier mentioned this pull request May 22, 2025
@Mugen87 Mugen87 changed the title optimize WebXR render path WebGPURenderer: Optimize WebXR render path. May 22, 2025
for ( const layer of this._layers ) {

layer.renderTarget.isXRRenderTarget = this._session !== null;
layer.renderTarget.hasExternalTextures = layer.renderTarget.isXRRenderTarget;
renderer.setSize( layer.renderTarget.width, layer.renderTarget.height, false, false );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move this line in the below if block. setSize() only matters if rendering into the default framebuffer. When rendering into a render target defined via setRenderTarget(), the dimensions from the render target are used.

Copy link
Collaborator

@Mugen87 Mugen87 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should change the renderer so setOutputRenderTarget() behaves like setRenderTarget(). Meaning, when an output render target is defined, the dimensions of the internal framebuffer target is derived from it. Right now, we only do this for depth:

frameBufferTarget.setSize( width, height, outputRenderTarget !== null ? outputRenderTarget.depth : 1 );

However, keeping the dimensions of the internal framebuffer target in sync with the output render target could simplify things by making certain setSize() calls obsolete.

We use setOutputRenderTarget() only in context of XR right now (to define a different output target than the default framebuffer) so if you think that design would be helpful we maybe can give it a shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move this line in the below if block.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should change the renderer so setOutputRenderTarget() behaves like setRenderTarget(). Meaning, when an output render target is defined, the dimensions of the internal framebuffer target is derived from it.

Maybe we can try that in a follow up PR? I'm a bit worried that some code might look at the renderer's size so if we don't update it, they might make the wrong decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants