fix(engine): sample-accurate volume automation so dense fades keep their audio#1117
Conversation
Fallow audit reportFound 24 findings. Duplication (16)
Health (8)
Generated by fallow. |
jrusso1020
left a comment
There was a problem hiding this comment.
Reviewed against the canonical rubric: full read on audioMixer.ts diff + the linked #1066 comment from abcfy2. The follow-up correctly addresses the reported concern with the right diagnosis.
Comment-vs-fix correspondence
abcfy2 reported four distinct rendering symptoms on v0.6.55:
| Case | data-volume |
GSAP fade | Observed |
|---|---|---|---|
| 1 | 0 |
yes | No audio track |
| 2 | 0.01 |
yes | No audio track |
| 3 | 0.8 |
none | Audio present, -16.8 dB |
| 4 | 0.8 |
yes | Audio present, -17.2 dB (~no fade) |
His hypothesis was a wiring bug ("the probe captures the keyframes but they don't appear to reach the audio mixer"). Miguel's diagnosis is different and, I'm convinced, correct: the keyframes DO reach the mixer, but the resulting nested-if(lt(t,…)) expression overflows FFmpeg's expression evaluator at ~95 levels, failing filter-graph init and dropping the entire audio mix.
That explains cases 1 and 2 cleanly (171 keyframes → overflow → no audio). The 86-vs-171 keyframe measurement Miguel cites ("86 OK; 171 → exit 234, Error initializing filters, no output file") is the kind of falsifiable evidence I trust — direct measurement of the actual ffmpeg invocation, not just code-reading.
Case 4 is where I paused. The commenter saw "audio present but no fade" rather than "no audio at all." Miguel's manual verification ("228 KB, audio present, faithful envelope — silent start (−68 dB) → peak (−29 dB) → silent end (−62 dB)") confirms the fix DOES make the fade audible for the previously-failing dense case — so even if case 4's exact failure mechanism differs slightly from case 1's (perhaps a borderline overflow that produces a silent-but-rendering filter graph?), the fix demonstrably resolves it end-to-end. Good investigative discipline diagnosing the real root cause rather than chasing the commenter's wiring hypothesis.
The fix — simplifyVolumeKeyframes
Pre-filter the keyframe list to a bounded piecewise-linear envelope before building the FFmpeg expression. Two layered guards:
-
Modified Ramer-Douglas-Peucker at 1% volume tolerance drops collinear points. The distance metric is vertical (along volume axis) at the point's time, not perpendicular — which is more correct for a time-parametrized envelope than standard RDP. Stack-based iterative implementation; no recursion-depth concern.
-
Uniform-downsample backstop caps at
MAX_VOLUME_SEGMENTS = 32if RDP alone leaves >32 points. Thestep = (N-1) / (MAX-1)formula correctly preserves both endpoints (i=0 → index 0,i=31 → index N-1). Thepoint.time > sampled.at(-1).timededup is defensive againstMath.roundcollisions on small N, which can't actually happen when N > 32, but harmless.
Walked the math: MAX_VOLUME_SEGMENTS = 32 gives ~3× margin under the observed evaluator limit (~95 on macOS Homebrew, lower on some Linux builds per Miguel's report). The 1% RDP tolerance is VOLUME_SIMPLIFY_EPSILON = 0.01, justified in the docstring as "imperceptible." Reasonable threshold for volume; an audio engineer might point out that perceptual thresholds are typically dB-based (~0.5 dB JND) and 1% linear varies in perceptual significance across the volume range. Not blocking — practical effect is fine for the dense-fade case.
Test coverage
The added test exercises a 300-keyframe eased fade and pins three invariants:
result.success === true— mix completesnestingDepth < 32— expression is boundedvolume=if(lt(t\,X)\,0+…regex — expression structure is intact
That's the right shape — verifies the regression cause is closed without depending on actual FFmpeg. The prior tests mocked ffmpeg, so they couldn't have caught the evaluator overflow; this one encodes the actual invariant. Good fix for that gap.
One small suggestion (non-blocking): a direct unit test of simplifyVolumeKeyframes as a pure function would tighten coverage:
test("linear ramp collapses to two endpoints", () => {
const linear = Array.from({ length: 100 }, (_, i) => ({ time: i / 10, volume: i / 99 }));
expect(simplifyVolumeKeyframes(linear)).toHaveLength(2);
});
test("eased curve preserves shape within tolerance", () => { ... });
test("audio-rate input is capped at MAX_VOLUME_SEGMENTS", () => { ... });The function is pure and easy to test directly. The current integration test catches the depth regression but doesn't pin the shape-preservation semantic. Not a blocker — manual ffmpeg verification covers it.
Scope check
Diff is 139+/6- across two files (audioMixer.ts + its test). No collateral changes; the simplification is a one-shot insertion between dedup and expression building. The function is pure ((keyframes) → keyframes), the change is local, and the existing static-volume short-circuit (simplified.length === 1) still works after simplification.
Fallow audit note
PR body acknowledges the inherited audioMixer.ts Fallow noise (parseAudioElements, mixAudioTracks complexity) — same known false-positive from #1066, not regressed by this PR. ✓
Verdict
Materially LGTM. The diagnosis is sound (measured 86 vs 171 keyframes), the fix is targeted and bounded, the test pins the right invariant, and the manual verification confirms the commenter's case is resolved end-to-end. The 1% epsilon and MAX = 32 are both defensible; the direct unit test of simplifyVolumeKeyframes would be a nice tightening but not blocking.
Holding the stamp for James's greenlight — same protocol.
— Rames Jusso (hyperframes)
| } | ||
| } | ||
|
|
||
| writeFileSync(wavPath, buffer); |
There was a problem hiding this comment.
Addressed in 03c3d37. Two things here:
- For context, the production render
workDirisdirname(outputPath)/work-${job.id}(renderOrchestrator.ts), not the OS temp dir — theos.tmpdir()flow CodeQL traced comes through the unit tests'mkdtempSync. So the predictable-name-in-shared-temp risk doesn't really apply in production. - That said, the in-place
writeFileSync(wavPath)was worth hardening regardless. It now writes to a random-named sibling (crypto.randomBytes) andrenameSyncs it into place — so there's no write to a predictable path (clears the alert's sink), and an atomic replace means a crash mid-write can't leave a truncated WAV for the downstream mix. CodeQL should clear on the next analysis run.
jrusso1020
left a comment
There was a problem hiding this comment.
Re-reviewed at 2ad5fc01. Significant scope expansion from the original bff2da58: the PR went from "bound the expression to escape the overflow" to a layered solution with PCM-sample gain as the primary path. Full read on audioVolumeEnvelope.ts, its test file, and the rewired audioMixer.ts.
What changed since prior round
| Commit | Change |
|---|---|
bff2da58 |
Original: RDP+downsample bound the keyframes before building the expression (what I reviewed first) |
8278af0f |
Layer 3: retry mix at base volume if the expression-path still fails (defense in depth) |
0041ad43 |
Tightened RDP tolerance 1% → 0.5% (directly addresses my prior perceptual-threshold concern) |
2ad5fc01 |
Layer 1: bake the envelope into PCM samples in-process — sample-accurate, no expression at all in the common path |
So the chain is now: PCM gain → bounded expression (fallback) → base volume (backstop). The audio track is mathematically never dropped.
Verifying the PCM bake path (audioVolumeEnvelope.ts)
WAV parsing
parseWavLayout walks RIFF/WAVE chunks correctly:
- Word-alignment handled (
offset = body + chunkSize + (chunkSize % 2)) ✓ - Chunk-size clamped against buffer length ✓
- Validates
WAVE_FORMAT_PCM(rejects float / extensible / ADPCM) +bitsPerSample === 16✓ - Loops until
datachunk found, then breaks
One small ordering quirk: if a WAV puts the data chunk BEFORE fmt (legal per spec, rare in practice), the early break exits before fmt is set → returns null → bake rejects → falls back to expression. Since the prepared audio is always prepareAudioTrack/extractAudioFromVideo writing through ffmpeg's -acodec pcm_s16le -ar 48000 -ac 2 (which writes fmt first), this never trips in the actual pipeline. Worth a sentence on the assumption, but no code change needed.
Sample loop
let segment = 0;
for (let frame = 0; frame < frameCount; frame += 1) {
const time = frame / sampleRate;
while (segment < envelope.length - 2 && time >= envelope[segment + 1]!.time) segment += 1;
// … interpolate gain, scale sample, clamp to int16 …
}- Segment advance is incremental: O(1) amortized per frame; O(N + M) total for N frames + M envelope points. No N×M blowup. ✓
- Span/progress edge cases handled (
span <= 0 ? 0, clamped progress) ✓ - Tail behavior correct: when
time > envelope[last].time, segment stays atlast - 1, progress clamps to 1, gain =envelope[last].volume(holds the last value) ✓ - int16 clamping is a defensive backstop after gain ≤ 1.0 from
toRelativeEnvelope✓
Performance napkin: 10-min × 48kHz × 2ch = ~57M sample-modify operations. At ~100M ops/sec for tight V8 loops, ~500ms per 10min of audio. Well within per-render overhead.
toRelativeEnvelope
- Filters non-finite, clamps volume to [0, 1], sorts by time, dedupes near-collisions ✓
- Inserts
baseVolumeat t=0 if the first keyframe is later — so the "base volume before first keyframe" semantic works ✓ - Empty result → caller returns false (fall back to expression) ✓
Wiring in audioMixer.ts
let bakedEnvelope = false;
if (element.volumeKeyframes && element.volumeKeyframes.length > 0) {
bakedEnvelope = applyVolumeEnvelopeToWav(…);
}
tracks.push({
…
volume: bakedEnvelope ? 1.0 : (element.volume ?? 1.0), // mix at unity when baked
volumeKeyframes: bakedEnvelope ? undefined : element.volumeKeyframes, // expression path skipped
});Clean conditional handoff. On bake success, volume=1.0 mixes at unity (gain is in the samples), and volumeKeyframes=undefined short-circuits buildVolumeExpression to a static volume=1.0. On bake failure, the original expression path takes over with the same RDP simplification I reviewed before.
Layer 3 — degraded retry
let result = await runFfmpeg(buildArgs(false), …);
…
if (!result.success && !signal?.aborted && hasAutomation) {
const retry = await runFfmpeg(buildArgs(true), …);
if (retry.success) { result = retry; degradedAutomation = true; }
}Single retry with ignoreKeyframes=true flag passed to buildVolumeExpression. Returns degradedAutomation flag surfaced as a warning in MixResult.error — caller can see the degradation happened.
Defense layered correctly: would only fire if layer 1 returned false (non-16-bit WAV) AND layer 2's RDP-bounded expression ALSO overflowed (would need an extraordinarily strict ffmpeg build). Robust.
Tests
The 5 new tests in audioVolumeEnvelope.test.ts are exactly what I suggested in the prior round — direct unit tests of the pure function:
- linear fade sample-accurate (sample at frame 0 = 0, frame N/2 ≈ 5000, frame N-1 > 9900) ✓
- track-start offset (composition-time → wav-relative time mapping) ✓
- base/tail holds (baseVolume before first keyframe, last value after) ✓
- 5000 keyframes without failure (the no-ceiling claim) ✓
- non-16-bit PCM rejected (caller fallback path) ✓
The integration test in audioMixer.test.ts from the prior commit still covers the expression-path RDP bound. Good layered coverage.
Investigative discipline
The PR body's "Why this design" section walks through MoviePy (NumPy sample arrays), Kdenlive/Shotcut (MLT filter), Remotion (per-frame curve in own mixer) — all sample-level gain, validating the architectural choice — and rejects Mediabunny with concrete reasoning (different encoder than rest of pipeline = golden-baseline risk + WASM AAC dependency + multi-track mixing surface). That's the right shape: research the canonical approaches before designing.
Minor observations (non-blocking)
-
"Exact GSAP curve" claim is slightly overstated. The PCM gain interpolates linearly between 60Hz probe samples, while the browser plays GSAP's actual eased curve. Between probe samples (16.67ms apart), the rendered envelope is piecewise-linear vs. the preview's curved. Practically imperceptible — a
power2.indeviation at 16.67ms resolution is well under the perceptual threshold — but the PR body says "sample-for-sample" which reads as exact. Worth softening to "matches preview to 60Hz probe resolution with linear interpolation between samples." -
Layer 3 isn't directly tested. The retry-on-fail path lives in
mixAudioTracksand depends onrunFfmpegreturningsuccess: falsefirst. If a future change torunFfmpeg's error semantics breaks the retry trigger, no test catches it. Could add a mock-based test that forces the first runFfmpeg call to fail and verifies the retry happens +degradedAutomationlands in the result. Not blocking — the layer is small and rare-path. -
WAV chunk-ordering assumption (fmt before data). All paths in this codebase write through ffmpeg which observes the convention, but
parseWavLayout's early break ondatameans a fmt-after-data WAV would be rejected. A docstring note on the assumption would document the constraint.
Verdict
Materially LGTM, much stronger than the original round. The PCM-bake primary path eliminates the expression-evaluator overflow at the source, the RDP-bounded fallback keeps backwards compatibility, and the base-volume backstop ensures audio is never silently dropped. Five direct unit tests + the existing integration test give the right coverage. The 0.5% RDP tightening was exactly the response I'd hoped for on my perceptual-threshold note.
Holding the stamp for James — same protocol.
— Rames Jusso (hyperframes)
|
Thanks for the re-review. Addressed all three notes in 61e8463:
Engine suite green at 648 tests (6 envelope unit tests now, incl. the ordering case). No production-logic change beyond the parser robustness fix. |
…eir audio Animated media volume (GSAP/JS fades) dropped the audio track entirely for dense fades. The 60 Hz timeline probe emits 100-300 keyframes for a multi-second fade, which were folded into an FFmpeg `volume` expression nesting one `if(lt(t,...))` per keyframe. Past ~95 nested levels (build-dependent, lower on some Linux ffmpeg builds) the expression overflows FFmpeg's evaluator, fails filter-graph init, fails the whole mix, and the muxer omits audio — so a `data-volume="0"` fade-in rendered with no audio at all (follow-up to #1066; this is why #1064's own scenario regressed once the fade was dense enough). Apply volume automation as sample-accurate gain, layered so audio is never lost: 1. Primary: bake the envelope into the prepared PCM samples in-process (audioVolumeEnvelope.ts). The track WAV is always pcm_s16le/48k/stereo; multiply its samples by the interpolated envelope and atomically rename the result into place, then mix at unity. No expression, no keyframe ceiling, exact at every sample, and the downstream ffmpeg amix/AAC encode is untouched so golden baselines only change where a fade is applied. The RIFF parser scans chunks order-independently and accepts only 16-bit PCM, falling back otherwise. The output is written to a random-named sibling and renamed, so a crash can't leave a truncated WAV and there's no predictable-path write. 2. Fallback: RDP-bounded ffmpeg `volume` expression (0.5% tolerance, capped at 32 segments) for the rare case a WAV is not 16-bit PCM. 0.5% keeps the rendered envelope within ~0.2 dB of the source curve. 3. Backstop: if an automated mix still fails, retry once at base volume and surface the degradation rather than dropping the track. This mirrors how OSS NLEs render automation (sample-level gain): MoviePy, Kdenlive/Shotcut (MLT), Remotion. Verified end-to-end: a 297-keyframe fade that rendered with no audio now bakes all 297 keyframes sample-accurately. Adds unit tests for sample-accurate gain, track-start offset, base/tail holds, thousands of keyframes, order-independent chunk parsing, and format rejection, plus mixer regression tests for bounded nesting and the base-volume backstop.
61e8463 to
03c3d37
Compare
|
Thanks for the fix! Audio fade-in/fade-out works correctly in the rendered output on v0.6.56. However, I noticed that Test case (68s composition): <audio id="bgm" src="music.mp3" data-start="0" data-duration="68"
data-track-index="0" data-volume="0"></audio>
<script>
const tl = gsap.timeline({ paused: true });
// Fade in 0 → 0.25 over 4s
tl.to("#bgm", { volume: 0.25, duration: 4.0, ease: "power2.out" }, 0);
// Fade out 0.25 → 0 over 4s (64s–68s)
tl.to("#bgm", { volume: 0, duration: 4.0, ease: "power2.in" }, 64);
window.__timelines["root"] = tl;
</script>Behavior:
My understanding is that Is this a known limitation of the preview path, or should Environment: HyperFrames 0.6.56, Node v24.16.0, Linux, FFmpeg n8.1.1 |
Preview audio with GSAP volume fades (e.g. data-volume="0" with a
gsap.to("#bgm", {volume:0.25, ...})) played ~1s then silenced. Root
cause: syncRuntimeMedia used fallbackAuthorVolume (data-volume) on the
first tick after a clip became active, clobbering the GSAP-seeked value.
The single-clock transport seeks GSAP before syncRuntimeMedia runs, so
el.volume already holds the animated value — we just need to trust it.
Fix — three layers, matching the renderer's approach (PR #1117):
1. First-tick tracking: on the first tick a clip is active
(previousRuntimeVolume===undefined), use currentElementVolume (GSAP's
seeked value) instead of fallbackAuthorVolume. In production the
transport always seeks GSAP before syncRuntimeMedia, so el.volume is
already at the correct animated position.
2. Probed keyframes: new probeElementVolumeKeyframes() runs the same
offline probe the renderer uses (discoverAudioVolumeAutomationFromTimeline)
directly in the browser. init.ts calls probeAndCacheElementVolume() when
an element is bound and a timeline is available. When keyframes are present,
syncRuntimeMedia drives volume from the interpolated envelope — no
GSAP-change tracking needed, no first-tick edge case, same data source
as the renderer.
3. Shared utilities: normaliseEnvelope(), interpolateVolumeGain(), and
probeAndCacheElementVolume() extracted to mediaVolumeEnvelope.ts and
exported from @hyperframes/core/media-volume-envelope. The engine's
audioVolumeEnvelope.ts imports from there — no duplicate logic between
the renderer and the new preview path.
Fallow audit exits non-zero on inherited complexity/duplication in init.ts
functions that shifted line numbers (applyClipLayout, transportTick, etc.),
unchanged by this PR — same known false-positive pattern noted in #1117.
Lint, format, typecheck, and unit tests all pass.
53 core/media tests pass (3 updated to pre-set el.volume to match the
runtime's bindMediaMetadataListeners — corrects a missing setup step).
audioVolumeEnvelope tests (6) still pass.
Preview audio with GSAP volume fades (e.g. data-volume="0" with a
gsap.to("#bgm", {volume:0.25, ...})) played ~1s then silenced. Root
cause: syncRuntimeMedia used fallbackAuthorVolume (data-volume) on the
first tick after a clip became active, clobbering the GSAP-seeked value.
The single-clock transport seeks GSAP before syncRuntimeMedia runs, so
el.volume already holds the animated value — we just need to trust it.
Fix — three layers, matching the renderer's approach (PR #1117):
1. First-tick tracking: on the first tick a clip is active
(previousRuntimeVolume===undefined), use currentElementVolume (GSAP's
seeked value) instead of fallbackAuthorVolume. In production the
transport always seeks GSAP before syncRuntimeMedia, so el.volume is
already at the correct animated position.
2. Probed keyframes: new probeElementVolumeKeyframes() runs the same
offline probe the renderer uses (discoverAudioVolumeAutomationFromTimeline)
directly in the browser. init.ts calls probeAndCacheElementVolume() when
an element is bound and a timeline is available. When keyframes are present,
syncRuntimeMedia drives volume from the interpolated envelope — no
GSAP-change tracking needed, no first-tick edge case, same data source
as the renderer.
3. Shared utilities: normaliseEnvelope(), interpolateVolumeGain(), and
probeAndCacheElementVolume() extracted to mediaVolumeEnvelope.ts and
exported from @hyperframes/core/media-volume-envelope. The engine's
audioVolumeEnvelope.ts imports from there — no duplicate logic between
the renderer and the new preview path.
Fallow audit exits non-zero on inherited complexity/duplication in init.ts
functions that shifted line numbers (applyClipLayout, transportTick, etc.),
unchanged by this PR — same known false-positive pattern noted in #1117.
Lint, format, typecheck, and unit tests all pass.
53 core/media tests pass (3 updated to pre-set el.volume to match the
runtime's bindMediaMetadataListeners — corrects a missing setup step).
audioVolumeEnvelope tests (6) still pass.
vanceingalls
left a comment
There was a problem hiding this comment.
Reviewed the full diff and James's prior rounds on this one.
Architecture
The layered approach is the right design: PCM-sample gain as primary (no expression ceiling), RDP-bounded FFmpeg expression as fallback, base-volume retry as backstop. This matches how every established OSS video tool handles time-varying volume (MoviePy, Shotcut/MLT, Remotion all multiply samples directly). The PR description's survey of alternatives is solid — the Mediabunny rejection reasoning (different encoder, WASM AAC dependency, multi-track mixing surface) is correct.
The single-responsibility split is clean: audioVolumeEnvelope.ts owns WAV I/O and sample math, audioMixer.ts owns the orchestration decision. The pure function boundary makes the module independently testable.
PCM bake path
WAV parser: order-independent chunk scan is correct. The word-alignment handling (body + chunkSize + (chunkSize % 2)) matches the spec. Validating WAVE_FORMAT_PCM + 16-bit rejects float/extensible/ADPCM cleanly, returning false so the caller falls through to the expression path rather than corrupting audio. The test covering data-before-fmt is a good addition.
Sample loop: incremental segment advance is O(N + M) total, not O(N×M). Span/progress edge cases handled correctly. Tail behavior (time > last keyframe → holds last volume) is correct. int16 clamping is proper — the clamp is technically redundant given gain ≤ 1.0, but appropriate defensive code for a signal-path function.
Shared-source safety: I verified prepareAudioTrack writes per-element paths (${element.id}-extracted.wav, ${element.id}-trimmed.wav) in workDir. No two elements share a prepared WAV path. The atomic write (temp + rename) protects against partial-write corruption. Safe.
Write-to-read idempotency: applyVolumeEnvelopeToWav mutates the path it was given. After this function the WAV contains the baked envelope. The mixer then sets volume: 1.0 and volumeKeyframes: undefined, so the expression path is completely bypassed. No double-application risk.
RDP simplification
simplifyVolumeKeyframes uses vertical distance (volume axis) rather than perpendicular RDP — correct choice for a time-parametrized envelope where time is the independent variable. The iterative stack eliminates recursion depth concerns. The MAX_VOLUME_SEGMENTS = 32 backstop is correctly bounded: 32 < 95 (observed evaluator limit on macOS Homebrew ffmpeg) with margin for stricter Linux builds. VOLUME_SIMPLIFY_EPSILON = 0.005 (0.5%) keeps the envelope within ~0.2 dB of the source curve, well under the ~1 dB loudness JND.
Layer 3 degraded retry
hasAutomation guard is correct — won't retry on a plain non-automation failure. The degradedAutomation flag surfaced in MixResult.error means the degradation is observable, not silent. Good observability.
Test coverage
audioVolumeEnvelope.test.ts: linear fade, offset, base/tail holds, 5000 keyframes, chunk-ordering, format rejection. Covers all the pure-function invariants.audioMixer.test.tsadditions: 300-keyframe expression depth bound (nesting < 32) and layer-3 retry with mock failure. Both test the right invariants.
One note (non-blocking)
interpolateVolumeGain is called once per sample frame (48kHz × 2ch = ~96k times/sec of audio) with a linear scan to find the segment. At 60Hz probe resolution the envelope has at most ~60 × duration points, but for a 10-minute audio clip that's ~3600 envelope points, and the scan re-starts from segment=0 per frame. The incremental-advance pattern from the original sample loop would be faster (O(N+M) vs O(N×M)). At 48kHz × 5min × 3600 points that's ~17B iterations — not a problem in practice for the current use cases, but worth a note for very long audio. Low priority for a follow-up.
Verdict
Solid fix. The root cause diagnosis is correct (measured: 86 kf → OK, 171 kf → exit 234), the chosen approach matches industry practice, the defense layers ensure audio is never dropped, and the test coverage is thorough. The PR body's parity claim was appropriately softened ("matches preview to 60Hz probe resolution") in response to James's note.
LGTM.
— Vai
|
@abcfy2 preview fix is in 0.6.57! it should work now fine |
|
@miguel-heygen Great! Everything works very well. Thank you for your hard working. |
Problem
Follow-up to #1066. On v0.6.55, animated media volume still fails for dense fades:
data-volume="0"drops the audio track entirely, and longer GSAP volume fades produce no audible change. Reported in #1066 (comment) (Linux, ffmpeg n8.1.1,keyframeCount:171).Root cause
buildVolumeExpressionfolded volume automation into an FFmpegvolumeexpression that nests oneif(lt(t,...))per keyframe. The 60 Hz timeline probe routinely emits 100–300 keyframes for a multi-second fade, nesting the expression deep enough to overflow FFmpeg's expression evaluator.The depth limit is build-dependent (~95 levels on macOS Homebrew ffmpeg 8.1.1, lower on some Linux builds), which is why the original PR's smaller repros passed locally while the reviewer's denser fade failed. When the limit is exceeded,
volume=...:eval=framefails filter-graph init, the entire audio mix returnssuccess:false, and the muxer omits the audio track — hence "no audio track at all."Measured directly: 86 keyframes → ffmpeg OK; 171 →
exit 234,Error initializing filters, no output file. This also explains why #1064's own scenario regressed: once adata-volume="0"fade is dense enough, the mix drops rather than fades.Fix — sample-accurate gain, layered so audio is never dropped
1. Primary: bake the envelope into the PCM samples (
audioVolumeEnvelope.ts, new). After a track's WAV is prepared (alwayspcm_s16le/ 48 kHz / stereo), its samples are multiplied by the interpolated gain envelope in-house, then mixed at unity. No FFmpeg expression at all → no keyframe ceiling, exact at every sample, and the downstream ffmpegamix/AAC encode is untouched (so golden baselines only move where a fade is actually applied). It's the MoviePy/MLT model done in-process, with no new dependency. The WAV parser scans RIFF chunks order-independently and only accepts 16-bit PCM, returningfalse(→ fallback) on anything else.2. Fallback: bounded ffmpeg
volumeexpression. Used only if a WAV isn't the expected 16-bit PCM. The expression is now bounded — Ramer–Douglas–Peucker simplifies the keyframes to a piecewise-linear envelope (0.5% tolerance; a linear fade → 2 points, an eased fade → a handful), with a uniform-downsample backstop atMAX_VOLUME_SEGMENTS = 32. The 0.5% tolerance keeps the rendered envelope within ~0.2 dB of the source curve (well under the ~1 dB loudness JND — addresses the prior perceptual-threshold note).3. Backstop: never drop the track. If an automated mix still fails on some ffmpeg build, retry once at base volume rather than emitting no audio, and surface the degradation in
MixResult.error.Chain: exact PCM gain → bounded expression → base volume.
Why this design (vs mediabunny / a mixer rewrite)
Surveyed how other OSS handles time-varying volume: MoviePy multiplies NumPy sample arrays, Kdenlive/Shotcut use a keyframable MLT filter, Remotion applies a per-frame curve in its own mixer — all sample-level gain, which is exactly what layer 1 now does. Mediabunny (already a Studio dependency for browser-side probing) would mean adding
@mediabunny/server+ a WASM AAC encoder in Node (no WebCodecs there), a different encoder than the rest of the ffmpeg pipeline (golden-baseline risk), and reimplementing multi-track mixing — more surface area, not less. Applying the gain to the already-PCM WAV keeps the proven ffmpeg mix/encode and changes nothing downstream.Preview/render parity (WYSIWYG)
Both preview and render derive from the same 60 Hz timeline probe samples. Layer 1 bakes those exact sample values into the PCM and interpolates linearly between them, so the render reproduces the curve at the probe's 60 Hz resolution — the same resolution GSAP updates
audio.volumeat in preview — rather than re-approximating it. The expression fallback is additionally held to ~0.2 dB. No source-of-truth drift between preview and render.Verification
sine.inOutfade withdata-volume="0": 14.6 KB output, no audio stream.baked=true) for all 297 keyframes — no expression, no ceiling.audioVolumeEnvelope.test.ts): sample-accurate fade, track-start offset, base/tail holds, 5 000 keyframes without failure, order-independent chunk parsing (databeforefmt), and format rejection.< 32) for dense keyframes, and the base-volume backstop surfacing its reason.bun run --filter @hyperframes/engine test→ 648 passedbun run --filter @hyperframes/core test -- src/runtime/media.test.ts→ 53 passedtypecheck(core, engine, producer),oxlint,oxfmt --checkall clean.Notes
fallowaudit still exits non-zero on inherited duplication/complexity inaudioMixer.ts(parseAudioElements,mixAudioTracks, etc.), unchanged by this PR — same known false-positive noted in fix: support animated media volume #1066. Lint/format/typecheck pass.