-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[AUDIO_WORKLET] Optimised output buffer copy #24891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5a70474
to
99f4c12
Compare
e19d4fd
to
348932f
Compare
2313240
to
748c167
Compare
4cba21c
to
998ac6b
Compare
This takes into account multiple multi-speaker outputs (the spec labels 18).
b57d103
to
5186686
Compare
It's not possible to merge the output data copy into a single |
This adds an interactive test to force growing the heap during playback: ``` test/runner interactive.test_audio_worklet_memory_growth ``` Tested with `interactive64` and `interactive_2gb` (for `interactive64_4gb` the heap is already at the browser's max in testing so can't be grown). It works by alloc'ing and leaking 2/3 of the current size until it can no longer do so. Emscripten regrows its wasm memory in the process, invalidating any data views (see #24891). **Edit: test can now grow from both the main and audio thread.**
Built on #24931 (it touches the same file, but a rebase after merge will fix this). This adds hard-panned audio files to test that the left and right channels don't get flipped with any changes to the audio worklet code (relevant for #24891, which changes how the copies are performed). ``` test/runner interactive.test_audio_worklet_hard_pans ``` The bass track is hard-left (with its right muted), drums are right.
This is done, |
var outputViewsNeeded = 0; | ||
for (entry of outputList) { | ||
outputViewsNeeded += entry.length; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above loop multiplies by * this.bytesPerChannel;
outside the loop, though the loop before that multiplies inside the loop. A small micro-opt would be to merge to multiply once outside. (though looks good to me either way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look over this. The most compact would probably be to take the incoming sizes (but I work on the assumption that the browser will do something wrong at some point in the future and throw the maths off).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look and reused the stackMemoryData
var (committed). I'm not sure it improves the readability, whereas for outputViewsNeeded
it made sense. I'm not sure there are enough channels to warrant it.
src/audio_worklet.js
Outdated
this.bytesPerChannel = this.samplesPerChannel * {{{ getNativeTypeSize('float') }}}; | ||
|
||
// Creates the output views (see createOutputViews() docs) | ||
this.maxBuffers = Math.min(((wwParams.stackSize - /*minimum alloc*/ 16) / this.bytesPerChannel) | 0, /*sensible limit*/ 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't ever use this.maxBuffers
beyond the call to this.createOutputViews()
below?
One way to recoup some of the code size increase would be to avoid storing this.xxx
members that aren't needed outsided this scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, we need to know the number of buffers when we call again to this.createOutputViews();
.
Maybe then this could do
this.outputViews = new Array(Math.min(((wwParams.stackSize - /*minimum alloc*/ 16) / this.bytesPerChannel) | 0, /*sensible limit*/ 64));
and instead of
this.outputViews.unshift(
inside createOutputViews()
, the function would always assign to this.outputViews[i] =
, reusing the array with fixed pre-initialized length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has merit and will result in a saving. Assertions can be against the array length.
I'll try to take a look today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I factored out tracking the length. Update incoming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the logic looks solid now.
I had a couple of minor comments, but those are cosmetic. LGTM either way, this looks good to land to me.
Thanks! |
One thing that comes to mind that may be worth probing: The TLS section is allocated at the end of the stack, (it should be exclusive of the stack size though). So you may try to place a global variable This will verify that the allocation from stack top won't stomp on TLS variables. |
[snip] I can add this as a separate PR, internals like this were one of my worries when starting this last year (most of those worries went away once it was working, but I'm still missing insights like this). |
Yeah that sounds good. |
Changes in as per @juj's suggestions. |
#25024 tests this. |
A reworking of #22753, which "improves the copy back from the audio worklet's heap to JS by 7-12x depending on the browser." From the previous description:
Since we pass in the stack for the worklet from the caller's heap, its address doesn't change. And since the render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers do not change either. This optimisation adds one-time subarray views and replaces the float-by-float copy with a simple
set()
per channel (per output).The existing interactive tests (written for the original PR) can be run for comparison:
These test various input/output arrangements as well as parameters (parameters are interesting because, depending on the browser, the sizes change as the params move from static to varying).
The original benchmark of the extracted copy is still valid:
https://wip.numfum.com/cw/2024-10-29/index.html
This is tested with 32- and 64-bit wasm (which required a reordering of how structs and data were stored to avoid alignment issues).
Some explanations:
WasmAudioWorkletProcessor
constructorprocess()
call are split into aligned struct data (see the comments) and audio/param dataASSERTIONS
are used to ensure everything fits and correctly alignsFuture improvements: the output views are sequential, so instead of of being individual views covering each channel the views could cover one to however-many-views needed, with a singleset()
being enough for all outputs.