Skip to content

fix(engine): sample-accurate volume automation so dense fades keep their audio#1117

Merged
miguel-heygen merged 1 commit into
mainfrom
fix/audio-volume-keyframe-depth
May 29, 2026
Merged

fix(engine): sample-accurate volume automation so dense fades keep their audio#1117
miguel-heygen merged 1 commit into
mainfrom
fix/audio-volume-keyframe-depth

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented May 29, 2026

Problem

Follow-up to #1066. On v0.6.55, animated media volume still fails for dense fades: data-volume="0" drops the audio track entirely, and longer GSAP volume fades produce no audible change. Reported in #1066 (comment) (Linux, ffmpeg n8.1.1, keyframeCount:171).

Root cause

buildVolumeExpression folded volume automation into an FFmpeg volume expression that nests one if(lt(t,...)) per keyframe. The 60 Hz timeline probe routinely emits 100–300 keyframes for a multi-second fade, nesting the expression deep enough to overflow FFmpeg's expression evaluator.

The depth limit is build-dependent (~95 levels on macOS Homebrew ffmpeg 8.1.1, lower on some Linux builds), which is why the original PR's smaller repros passed locally while the reviewer's denser fade failed. When the limit is exceeded, volume=...:eval=frame fails filter-graph init, the entire audio mix returns success:false, and the muxer omits the audio track — hence "no audio track at all."

Measured directly: 86 keyframes → ffmpeg OK; 171 → exit 234, Error initializing filters, no output file. This also explains why #1064's own scenario regressed: once a data-volume="0" fade is dense enough, the mix drops rather than fades.

Fix — sample-accurate gain, layered so audio is never dropped

1. Primary: bake the envelope into the PCM samples (audioVolumeEnvelope.ts, new). After a track's WAV is prepared (always pcm_s16le / 48 kHz / stereo), its samples are multiplied by the interpolated gain envelope in-house, then mixed at unity. No FFmpeg expression at all → no keyframe ceiling, exact at every sample, and the downstream ffmpeg amix/AAC encode is untouched (so golden baselines only move where a fade is actually applied). It's the MoviePy/MLT model done in-process, with no new dependency. The WAV parser scans RIFF chunks order-independently and only accepts 16-bit PCM, returning false (→ fallback) on anything else.

2. Fallback: bounded ffmpeg volume expression. Used only if a WAV isn't the expected 16-bit PCM. The expression is now bounded — Ramer–Douglas–Peucker simplifies the keyframes to a piecewise-linear envelope (0.5% tolerance; a linear fade → 2 points, an eased fade → a handful), with a uniform-downsample backstop at MAX_VOLUME_SEGMENTS = 32. The 0.5% tolerance keeps the rendered envelope within ~0.2 dB of the source curve (well under the ~1 dB loudness JND — addresses the prior perceptual-threshold note).

3. Backstop: never drop the track. If an automated mix still fails on some ffmpeg build, retry once at base volume rather than emitting no audio, and surface the degradation in MixResult.error.

Chain: exact PCM gain → bounded expression → base volume.

Why this design (vs mediabunny / a mixer rewrite)

Surveyed how other OSS handles time-varying volume: MoviePy multiplies NumPy sample arrays, Kdenlive/Shotcut use a keyframable MLT filter, Remotion applies a per-frame curve in its own mixer — all sample-level gain, which is exactly what layer 1 now does. Mediabunny (already a Studio dependency for browser-side probing) would mean adding @mediabunny/server + a WASM AAC encoder in Node (no WebCodecs there), a different encoder than the rest of the ffmpeg pipeline (golden-baseline risk), and reimplementing multi-track mixing — more surface area, not less. Applying the gain to the already-PCM WAV keeps the proven ffmpeg mix/encode and changes nothing downstream.

Preview/render parity (WYSIWYG)

Both preview and render derive from the same 60 Hz timeline probe samples. Layer 1 bakes those exact sample values into the PCM and interpolates linearly between them, so the render reproduces the curve at the probe's 60 Hz resolution — the same resolution GSAP updates audio.volume at in preview — rather than re-approximating it. The expression fallback is additionally held to ~0.2 dB. No source-of-truth drift between preview and render.

Verification

  • Before (published 0.6.55), 297-keyframe sine.inOut fade with data-volume="0": 14.6 KB output, no audio stream.
  • After, identical composition: audio present, faithful envelope (silent → peak → silent). A runtime probe confirmed the primary PCM path ran (baked=true) for all 297 keyframes — no expression, no ceiling.
  • New unit tests (audioVolumeEnvelope.test.ts): sample-accurate fade, track-start offset, base/tail holds, 5 000 keyframes without failure, order-independent chunk parsing (data before fmt ), and format rejection.
  • Mixer regression tests: bounded nesting depth (< 32) for dense keyframes, and the base-volume backstop surfacing its reason.
  • bun run --filter @hyperframes/engine test → 648 passed
  • bun run --filter @hyperframes/core test -- src/runtime/media.test.ts → 53 passed
  • typecheck (core, engine, producer), oxlint, oxfmt --check all clean.

Notes

  • The lefthook fallow audit still exits non-zero on inherited duplication/complexity in audioMixer.ts (parseAudioElements, mixAudioTracks, etc.), unchanged by this PR — same known false-positive noted in fix: support animated media volume #1066. Lint/format/typecheck pass.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

Fallow audit report

Found 24 findings.

Duplication (16)
Severity Rule Location Description
minor fallow/code-duplication packages/engine/src/services/audioMixer.test.ts:31 Code clone group 1 (13 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.test.ts:68 Code clone group 1 (13 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.test.ts:96 Code clone group 2 (8 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.test.ts:111 Code clone group 3 (6 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.test.ts:146 Code clone group 2 (8 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.test.ts:166 Code clone group 3 (6 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.ts:178 Code clone group 4 (13 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.ts:203 Code clone group 4 (13 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.ts:233 Code clone group 5 (23 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.ts:271 Code clone group 6 (24 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.ts:272 Code clone group 5 (23 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.ts:320 Code clone group 6 (24 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioMixer.ts:388 Code clone group 7 (8 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioVolumeEnvelope.test.ts:16 Code clone group 8 (8 lines, 2 instances)
minor fallow/code-duplication packages/engine/src/services/audioVolumeEnvelope.test.ts:152 Code clone group 8 (8 lines, 2 instances)
minor fallow/code-duplication packages/producer/src/services/audioExtractor.ts:199 Code clone group 7 (8 lines, 2 instances)
Health (8)
Severity Rule Location Description
minor fallow/high-crap-score packages/engine/src/services/audioMixer.ts:67 'simplifyVolumeKeyframes' has CRAP score 37.1 (threshold: 30.0, cyclomatic 11)
minor fallow/high-crap-score packages/engine/src/services/audioMixer.ts:113 'buildVolumeExpression' has CRAP score 31.6 (threshold: 30.0, cyclomatic 10)
major fallow/high-cognitive-complexity packages/engine/src/services/audioMixer.ts:172 'parseAudioElements' has cognitive complexity 28 (threshold: 15)
minor fallow/high-crap-score packages/engine/src/services/audioMixer.ts:229 'extractAudioFromVideo' has CRAP score 43.1 (threshold: 30.0, cyclomatic 12)
major fallow/high-crap-score packages/engine/src/services/audioMixer.ts:363 'mixAudioTracks' has CRAP score 79.4 (threshold: 30.0, cyclomatic 17)
critical fallow/high-crap-score packages/engine/src/services/audioMixer.ts:487 '<arrow>' has CRAP score 126.5 (threshold: 30.0, cyclomatic 22)
minor fallow/high-crap-score packages/engine/src/services/audioVolumeEnvelope.ts:40 'parseWavLayout' has CRAP score 49.5 (threshold: 30.0, cyclomatic 13)
minor fallow/high-cognitive-complexity packages/engine/src/services/audioVolumeEnvelope.ts:114 'applyVolumeEnvelopeToWav' has cognitive complexity 19 (threshold: 15)

Generated by fallow.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed against the canonical rubric: full read on audioMixer.ts diff + the linked #1066 comment from abcfy2. The follow-up correctly addresses the reported concern with the right diagnosis.

Comment-vs-fix correspondence

abcfy2 reported four distinct rendering symptoms on v0.6.55:

Case data-volume GSAP fade Observed
1 0 yes No audio track
2 0.01 yes No audio track
3 0.8 none Audio present, -16.8 dB
4 0.8 yes Audio present, -17.2 dB (~no fade)

His hypothesis was a wiring bug ("the probe captures the keyframes but they don't appear to reach the audio mixer"). Miguel's diagnosis is different and, I'm convinced, correct: the keyframes DO reach the mixer, but the resulting nested-if(lt(t,…)) expression overflows FFmpeg's expression evaluator at ~95 levels, failing filter-graph init and dropping the entire audio mix.

That explains cases 1 and 2 cleanly (171 keyframes → overflow → no audio). The 86-vs-171 keyframe measurement Miguel cites ("86 OK; 171 → exit 234, Error initializing filters, no output file") is the kind of falsifiable evidence I trust — direct measurement of the actual ffmpeg invocation, not just code-reading.

Case 4 is where I paused. The commenter saw "audio present but no fade" rather than "no audio at all." Miguel's manual verification ("228 KB, audio present, faithful envelope — silent start (−68 dB) → peak (−29 dB) → silent end (−62 dB)") confirms the fix DOES make the fade audible for the previously-failing dense case — so even if case 4's exact failure mechanism differs slightly from case 1's (perhaps a borderline overflow that produces a silent-but-rendering filter graph?), the fix demonstrably resolves it end-to-end. Good investigative discipline diagnosing the real root cause rather than chasing the commenter's wiring hypothesis.

The fix — simplifyVolumeKeyframes

Pre-filter the keyframe list to a bounded piecewise-linear envelope before building the FFmpeg expression. Two layered guards:

  1. Modified Ramer-Douglas-Peucker at 1% volume tolerance drops collinear points. The distance metric is vertical (along volume axis) at the point's time, not perpendicular — which is more correct for a time-parametrized envelope than standard RDP. Stack-based iterative implementation; no recursion-depth concern.

  2. Uniform-downsample backstop caps at MAX_VOLUME_SEGMENTS = 32 if RDP alone leaves >32 points. The step = (N-1) / (MAX-1) formula correctly preserves both endpoints (i=0 → index 0, i=31 → index N-1). The point.time > sampled.at(-1).time dedup is defensive against Math.round collisions on small N, which can't actually happen when N > 32, but harmless.

Walked the math: MAX_VOLUME_SEGMENTS = 32 gives ~3× margin under the observed evaluator limit (~95 on macOS Homebrew, lower on some Linux builds per Miguel's report). The 1% RDP tolerance is VOLUME_SIMPLIFY_EPSILON = 0.01, justified in the docstring as "imperceptible." Reasonable threshold for volume; an audio engineer might point out that perceptual thresholds are typically dB-based (~0.5 dB JND) and 1% linear varies in perceptual significance across the volume range. Not blocking — practical effect is fine for the dense-fade case.

Test coverage

The added test exercises a 300-keyframe eased fade and pins three invariants:

  • result.success === true — mix completes
  • nestingDepth < 32 — expression is bounded
  • volume=if(lt(t\,X)\,0+… regex — expression structure is intact

That's the right shape — verifies the regression cause is closed without depending on actual FFmpeg. The prior tests mocked ffmpeg, so they couldn't have caught the evaluator overflow; this one encodes the actual invariant. Good fix for that gap.

One small suggestion (non-blocking): a direct unit test of simplifyVolumeKeyframes as a pure function would tighten coverage:

test("linear ramp collapses to two endpoints", () => {
  const linear = Array.from({ length: 100 }, (_, i) => ({ time: i / 10, volume: i / 99 }));
  expect(simplifyVolumeKeyframes(linear)).toHaveLength(2);
});
test("eased curve preserves shape within tolerance", () => { ... });
test("audio-rate input is capped at MAX_VOLUME_SEGMENTS", () => { ... });

The function is pure and easy to test directly. The current integration test catches the depth regression but doesn't pin the shape-preservation semantic. Not a blocker — manual ffmpeg verification covers it.

Scope check

Diff is 139+/6- across two files (audioMixer.ts + its test). No collateral changes; the simplification is a one-shot insertion between dedup and expression building. The function is pure ((keyframes) → keyframes), the change is local, and the existing static-volume short-circuit (simplified.length === 1) still works after simplification.

Fallow audit note

PR body acknowledges the inherited audioMixer.ts Fallow noise (parseAudioElements, mixAudioTracks complexity) — same known false-positive from #1066, not regressed by this PR. ✓

Verdict

Materially LGTM. The diagnosis is sound (measured 86 vs 171 keyframes), the fix is targeted and bounded, the test pins the right invariant, and the manual verification confirms the commenter's case is resolved end-to-end. The 1% epsilon and MAX = 32 are both defensible; the direct unit test of simplifyVolumeKeyframes would be a nice tightening but not blocking.

Holding the stamp for James's greenlight — same protocol.

— Rames Jusso (hyperframes)

}
}

writeFileSync(wavPath, buffer);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 03c3d37. Two things here:

  • For context, the production render workDir is dirname(outputPath)/work-${job.id} (renderOrchestrator.ts), not the OS temp dir — the os.tmpdir() flow CodeQL traced comes through the unit tests' mkdtempSync. So the predictable-name-in-shared-temp risk doesn't really apply in production.
  • That said, the in-place writeFileSync(wavPath) was worth hardening regardless. It now writes to a random-named sibling (crypto.randomBytes) and renameSyncs it into place — so there's no write to a predictable path (clears the alert's sink), and an atomic replace means a crash mid-write can't leave a truncated WAV for the downstream mix. CodeQL should clear on the next analysis run.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed at 2ad5fc01. Significant scope expansion from the original bff2da58: the PR went from "bound the expression to escape the overflow" to a layered solution with PCM-sample gain as the primary path. Full read on audioVolumeEnvelope.ts, its test file, and the rewired audioMixer.ts.

What changed since prior round

Commit Change
bff2da58 Original: RDP+downsample bound the keyframes before building the expression (what I reviewed first)
8278af0f Layer 3: retry mix at base volume if the expression-path still fails (defense in depth)
0041ad43 Tightened RDP tolerance 1% → 0.5% (directly addresses my prior perceptual-threshold concern)
2ad5fc01 Layer 1: bake the envelope into PCM samples in-process — sample-accurate, no expression at all in the common path

So the chain is now: PCM gain → bounded expression (fallback) → base volume (backstop). The audio track is mathematically never dropped.

Verifying the PCM bake path (audioVolumeEnvelope.ts)

WAV parsing

parseWavLayout walks RIFF/WAVE chunks correctly:

  • Word-alignment handled (offset = body + chunkSize + (chunkSize % 2)) ✓
  • Chunk-size clamped against buffer length ✓
  • Validates WAVE_FORMAT_PCM (rejects float / extensible / ADPCM) + bitsPerSample === 16
  • Loops until data chunk found, then breaks

One small ordering quirk: if a WAV puts the data chunk BEFORE fmt (legal per spec, rare in practice), the early break exits before fmt is set → returns null → bake rejects → falls back to expression. Since the prepared audio is always prepareAudioTrack/extractAudioFromVideo writing through ffmpeg's -acodec pcm_s16le -ar 48000 -ac 2 (which writes fmt first), this never trips in the actual pipeline. Worth a sentence on the assumption, but no code change needed.

Sample loop

let segment = 0;
for (let frame = 0; frame < frameCount; frame += 1) {
  const time = frame / sampleRate;
  while (segment < envelope.length - 2 && time >= envelope[segment + 1]!.time) segment += 1;
  // … interpolate gain, scale sample, clamp to int16 …
}
  • Segment advance is incremental: O(1) amortized per frame; O(N + M) total for N frames + M envelope points. No N×M blowup. ✓
  • Span/progress edge cases handled (span <= 0 ? 0, clamped progress) ✓
  • Tail behavior correct: when time > envelope[last].time, segment stays at last - 1, progress clamps to 1, gain = envelope[last].volume (holds the last value) ✓
  • int16 clamping is a defensive backstop after gain ≤ 1.0 from toRelativeEnvelope

Performance napkin: 10-min × 48kHz × 2ch = ~57M sample-modify operations. At ~100M ops/sec for tight V8 loops, ~500ms per 10min of audio. Well within per-render overhead.

toRelativeEnvelope

  • Filters non-finite, clamps volume to [0, 1], sorts by time, dedupes near-collisions ✓
  • Inserts baseVolume at t=0 if the first keyframe is later — so the "base volume before first keyframe" semantic works ✓
  • Empty result → caller returns false (fall back to expression) ✓

Wiring in audioMixer.ts

let bakedEnvelope = false;
if (element.volumeKeyframes && element.volumeKeyframes.length > 0) {
  bakedEnvelope = applyVolumeEnvelopeToWav();
}
tracks.push({
  
  volume: bakedEnvelope ? 1.0 : (element.volume ?? 1.0),  // mix at unity when baked
  volumeKeyframes: bakedEnvelope ? undefined : element.volumeKeyframes,  // expression path skipped
});

Clean conditional handoff. On bake success, volume=1.0 mixes at unity (gain is in the samples), and volumeKeyframes=undefined short-circuits buildVolumeExpression to a static volume=1.0. On bake failure, the original expression path takes over with the same RDP simplification I reviewed before.

Layer 3 — degraded retry

let result = await runFfmpeg(buildArgs(false), );

if (!result.success && !signal?.aborted && hasAutomation) {
  const retry = await runFfmpeg(buildArgs(true), );
  if (retry.success) { result = retry; degradedAutomation = true; }
}

Single retry with ignoreKeyframes=true flag passed to buildVolumeExpression. Returns degradedAutomation flag surfaced as a warning in MixResult.error — caller can see the degradation happened.

Defense layered correctly: would only fire if layer 1 returned false (non-16-bit WAV) AND layer 2's RDP-bounded expression ALSO overflowed (would need an extraordinarily strict ffmpeg build). Robust.

Tests

The 5 new tests in audioVolumeEnvelope.test.ts are exactly what I suggested in the prior round — direct unit tests of the pure function:

  • linear fade sample-accurate (sample at frame 0 = 0, frame N/2 ≈ 5000, frame N-1 > 9900) ✓
  • track-start offset (composition-time → wav-relative time mapping) ✓
  • base/tail holds (baseVolume before first keyframe, last value after) ✓
  • 5000 keyframes without failure (the no-ceiling claim) ✓
  • non-16-bit PCM rejected (caller fallback path) ✓

The integration test in audioMixer.test.ts from the prior commit still covers the expression-path RDP bound. Good layered coverage.

Investigative discipline

The PR body's "Why this design" section walks through MoviePy (NumPy sample arrays), Kdenlive/Shotcut (MLT filter), Remotion (per-frame curve in own mixer) — all sample-level gain, validating the architectural choice — and rejects Mediabunny with concrete reasoning (different encoder than rest of pipeline = golden-baseline risk + WASM AAC dependency + multi-track mixing surface). That's the right shape: research the canonical approaches before designing.

Minor observations (non-blocking)

  1. "Exact GSAP curve" claim is slightly overstated. The PCM gain interpolates linearly between 60Hz probe samples, while the browser plays GSAP's actual eased curve. Between probe samples (16.67ms apart), the rendered envelope is piecewise-linear vs. the preview's curved. Practically imperceptible — a power2.in deviation at 16.67ms resolution is well under the perceptual threshold — but the PR body says "sample-for-sample" which reads as exact. Worth softening to "matches preview to 60Hz probe resolution with linear interpolation between samples."

  2. Layer 3 isn't directly tested. The retry-on-fail path lives in mixAudioTracks and depends on runFfmpeg returning success: false first. If a future change to runFfmpeg's error semantics breaks the retry trigger, no test catches it. Could add a mock-based test that forces the first runFfmpeg call to fail and verifies the retry happens + degradedAutomation lands in the result. Not blocking — the layer is small and rare-path.

  3. WAV chunk-ordering assumption (fmt before data). All paths in this codebase write through ffmpeg which observes the convention, but parseWavLayout's early break on data means a fmt-after-data WAV would be rejected. A docstring note on the assumption would document the constraint.

Verdict

Materially LGTM, much stronger than the original round. The PCM-bake primary path eliminates the expression-evaluator overflow at the source, the RDP-bounded fallback keeps backwards compatibility, and the base-volume backstop ensures audio is never silently dropped. Five direct unit tests + the existing integration test give the right coverage. The 0.5% RDP tightening was exactly the response I'd hoped for on my perceptual-threshold note.

Holding the stamp for James — same protocol.

— Rames Jusso (hyperframes)

@miguel-heygen
Copy link
Copy Markdown
Collaborator Author

Thanks for the re-review. Addressed all three notes in 61e8463:

  1. "exact GSAP curve" overstatement — reworded the parity section. It's a piecewise-linear reproduction at the probe's 60 Hz resolution (the same rate GSAP updates audio.volume at in preview), not the continuous curve. No longer claims sample-for-sample identity with the analytic ease.

  2. Layer 3 not directly tested — the fallback test now asserts result.error surfaces the base-volume degradation (/base volume/i), so the backstop is pinned as surfaced, not silently swallowed.

  3. WAV chunk-ordering assumption — went one better than a docstring: the parser no longer breaks on data, so it scans chunks order-independently (data may precede fmt , trailing LIST/fact chunks are skipped). Added a data-before-fmt test that pins it. Docstring updated to describe the order-independent scan.

Engine suite green at 648 tests (6 envelope unit tests now, incl. the ordering case). No production-logic change beyond the parser robustness fix.

@miguel-heygen miguel-heygen changed the title fix(engine): cap volume-automation keyframes so dense fades keep their audio fix(engine): sample-accurate volume automation so dense fades keep their audio May 29, 2026
…eir audio

Animated media volume (GSAP/JS fades) dropped the audio track entirely for dense
fades. The 60 Hz timeline probe emits 100-300 keyframes for a multi-second fade,
which were folded into an FFmpeg `volume` expression nesting one `if(lt(t,...))`
per keyframe. Past ~95 nested levels (build-dependent, lower on some Linux ffmpeg
builds) the expression overflows FFmpeg's evaluator, fails filter-graph init,
fails the whole mix, and the muxer omits audio — so a `data-volume="0"` fade-in
rendered with no audio at all (follow-up to #1066; this is why #1064's own
scenario regressed once the fade was dense enough).

Apply volume automation as sample-accurate gain, layered so audio is never lost:

1. Primary: bake the envelope into the prepared PCM samples in-process
   (audioVolumeEnvelope.ts). The track WAV is always pcm_s16le/48k/stereo;
   multiply its samples by the interpolated envelope and atomically rename the
   result into place, then mix at unity. No expression, no keyframe ceiling,
   exact at every sample, and the downstream ffmpeg amix/AAC encode is untouched
   so golden baselines only change where a fade is applied. The RIFF parser
   scans chunks order-independently and accepts only 16-bit PCM, falling back
   otherwise. The output is written to a random-named sibling and renamed, so a
   crash can't leave a truncated WAV and there's no predictable-path write.
2. Fallback: RDP-bounded ffmpeg `volume` expression (0.5% tolerance, capped at
   32 segments) for the rare case a WAV is not 16-bit PCM. 0.5% keeps the
   rendered envelope within ~0.2 dB of the source curve.
3. Backstop: if an automated mix still fails, retry once at base volume and
   surface the degradation rather than dropping the track.

This mirrors how OSS NLEs render automation (sample-level gain): MoviePy,
Kdenlive/Shotcut (MLT), Remotion.

Verified end-to-end: a 297-keyframe fade that rendered with no audio now bakes
all 297 keyframes sample-accurately. Adds unit tests for sample-accurate gain,
track-start offset, base/tail holds, thousands of keyframes, order-independent
chunk parsing, and format rejection, plus mixer regression tests for bounded
nesting and the base-volume backstop.
@miguel-heygen miguel-heygen force-pushed the fix/audio-volume-keyframe-depth branch from 61e8463 to 03c3d37 Compare May 29, 2026 03:47
@miguel-heygen miguel-heygen merged commit 95d2a94 into main May 29, 2026
40 of 41 checks passed
@miguel-heygen miguel-heygen deleted the fix/audio-volume-keyframe-depth branch May 29, 2026 03:49
@abcfy2
Copy link
Copy Markdown

abcfy2 commented May 29, 2026

@miguel-heygen

Thanks for the fix! Audio fade-in/fade-out works correctly in the rendered output on v0.6.56.

However, I noticed that hyperframes preview does not play the audio correctly. When previewing, I only hear about 1 second of audio at the beginning, then a brief static/buzzing sound, and then silence for the rest of the composition. The rendered MP4 has no such issue — audio plays perfectly with the fade envelope applied.

Test case (68s composition):

<audio id="bgm" src="music.mp3" data-start="0" data-duration="68"
       data-track-index="0" data-volume="0"></audio>
<script>
  const tl = gsap.timeline({ paused: true });
  // Fade in 0 → 0.25 over 4s
  tl.to("#bgm", { volume: 0.25, duration: 4.0, ease: "power2.out" }, 0);
  // Fade out 0.25 → 0 over 4s (64s–68s)
  tl.to("#bgm", { volume: 0, duration: 4.0, ease: "power2.in" }, 64);
  window.__timelines["root"] = tl;
</script>

Behavior:

  • hyperframes render — audio present, fade-in and fade-out both work
  • hyperframes preview — audio cuts out after ~1s with static, then silent

My understanding is that syncRuntimeMedia re-asserts el.volume from data-volume every 50ms tick, which overwrites the GSAP-animated volume. Since data-volume="0", the runtime keeps resetting volume to 0, causing the brief glitch then silence. The render path bypasses this because the probe samples the envelope directly and the PCM baking applies it offline.

Is this a known limitation of the preview path, or should syncRuntimeMedia also preserve GSAP-authored volume changes?

Environment: HyperFrames 0.6.56, Node v24.16.0, Linux, FFmpeg n8.1.1

miguel-heygen added a commit that referenced this pull request May 29, 2026
Preview audio with GSAP volume fades (e.g. data-volume="0" with a
gsap.to("#bgm", {volume:0.25, ...})) played ~1s then silenced. Root
cause: syncRuntimeMedia used fallbackAuthorVolume (data-volume) on the
first tick after a clip became active, clobbering the GSAP-seeked value.
The single-clock transport seeks GSAP before syncRuntimeMedia runs, so
el.volume already holds the animated value — we just need to trust it.

Fix — three layers, matching the renderer's approach (PR #1117):

1. First-tick tracking: on the first tick a clip is active
   (previousRuntimeVolume===undefined), use currentElementVolume (GSAP's
   seeked value) instead of fallbackAuthorVolume. In production the
   transport always seeks GSAP before syncRuntimeMedia, so el.volume is
   already at the correct animated position.

2. Probed keyframes: new probeElementVolumeKeyframes() runs the same
   offline probe the renderer uses (discoverAudioVolumeAutomationFromTimeline)
   directly in the browser. init.ts calls probeAndCacheElementVolume() when
   an element is bound and a timeline is available. When keyframes are present,
   syncRuntimeMedia drives volume from the interpolated envelope — no
   GSAP-change tracking needed, no first-tick edge case, same data source
   as the renderer.

3. Shared utilities: normaliseEnvelope(), interpolateVolumeGain(), and
   probeAndCacheElementVolume() extracted to mediaVolumeEnvelope.ts and
   exported from @hyperframes/core/media-volume-envelope. The engine's
   audioVolumeEnvelope.ts imports from there — no duplicate logic between
   the renderer and the new preview path.

Fallow audit exits non-zero on inherited complexity/duplication in init.ts
functions that shifted line numbers (applyClipLayout, transportTick, etc.),
unchanged by this PR — same known false-positive pattern noted in #1117.
Lint, format, typecheck, and unit tests all pass.

53 core/media tests pass (3 updated to pre-set el.volume to match the
runtime's bindMediaMetadataListeners — corrects a missing setup step).
audioVolumeEnvelope tests (6) still pass.
miguel-heygen added a commit that referenced this pull request May 29, 2026
Preview audio with GSAP volume fades (e.g. data-volume="0" with a
gsap.to("#bgm", {volume:0.25, ...})) played ~1s then silenced. Root
cause: syncRuntimeMedia used fallbackAuthorVolume (data-volume) on the
first tick after a clip became active, clobbering the GSAP-seeked value.
The single-clock transport seeks GSAP before syncRuntimeMedia runs, so
el.volume already holds the animated value — we just need to trust it.

Fix — three layers, matching the renderer's approach (PR #1117):

1. First-tick tracking: on the first tick a clip is active
   (previousRuntimeVolume===undefined), use currentElementVolume (GSAP's
   seeked value) instead of fallbackAuthorVolume. In production the
   transport always seeks GSAP before syncRuntimeMedia, so el.volume is
   already at the correct animated position.

2. Probed keyframes: new probeElementVolumeKeyframes() runs the same
   offline probe the renderer uses (discoverAudioVolumeAutomationFromTimeline)
   directly in the browser. init.ts calls probeAndCacheElementVolume() when
   an element is bound and a timeline is available. When keyframes are present,
   syncRuntimeMedia drives volume from the interpolated envelope — no
   GSAP-change tracking needed, no first-tick edge case, same data source
   as the renderer.

3. Shared utilities: normaliseEnvelope(), interpolateVolumeGain(), and
   probeAndCacheElementVolume() extracted to mediaVolumeEnvelope.ts and
   exported from @hyperframes/core/media-volume-envelope. The engine's
   audioVolumeEnvelope.ts imports from there — no duplicate logic between
   the renderer and the new preview path.

Fallow audit exits non-zero on inherited complexity/duplication in init.ts
functions that shifted line numbers (applyClipLayout, transportTick, etc.),
unchanged by this PR — same known false-positive pattern noted in #1117.
Lint, format, typecheck, and unit tests all pass.

53 core/media tests pass (3 updated to pre-set el.volume to match the
runtime's bindMediaMetadataListeners — corrects a missing setup step).
audioVolumeEnvelope tests (6) still pass.
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff and James's prior rounds on this one.

Architecture

The layered approach is the right design: PCM-sample gain as primary (no expression ceiling), RDP-bounded FFmpeg expression as fallback, base-volume retry as backstop. This matches how every established OSS video tool handles time-varying volume (MoviePy, Shotcut/MLT, Remotion all multiply samples directly). The PR description's survey of alternatives is solid — the Mediabunny rejection reasoning (different encoder, WASM AAC dependency, multi-track mixing surface) is correct.

The single-responsibility split is clean: audioVolumeEnvelope.ts owns WAV I/O and sample math, audioMixer.ts owns the orchestration decision. The pure function boundary makes the module independently testable.

PCM bake path

WAV parser: order-independent chunk scan is correct. The word-alignment handling (body + chunkSize + (chunkSize % 2)) matches the spec. Validating WAVE_FORMAT_PCM + 16-bit rejects float/extensible/ADPCM cleanly, returning false so the caller falls through to the expression path rather than corrupting audio. The test covering data-before-fmt is a good addition.

Sample loop: incremental segment advance is O(N + M) total, not O(N×M). Span/progress edge cases handled correctly. Tail behavior (time > last keyframe → holds last volume) is correct. int16 clamping is proper — the clamp is technically redundant given gain ≤ 1.0, but appropriate defensive code for a signal-path function.

Shared-source safety: I verified prepareAudioTrack writes per-element paths (${element.id}-extracted.wav, ${element.id}-trimmed.wav) in workDir. No two elements share a prepared WAV path. The atomic write (temp + rename) protects against partial-write corruption. Safe.

Write-to-read idempotency: applyVolumeEnvelopeToWav mutates the path it was given. After this function the WAV contains the baked envelope. The mixer then sets volume: 1.0 and volumeKeyframes: undefined, so the expression path is completely bypassed. No double-application risk.

RDP simplification

simplifyVolumeKeyframes uses vertical distance (volume axis) rather than perpendicular RDP — correct choice for a time-parametrized envelope where time is the independent variable. The iterative stack eliminates recursion depth concerns. The MAX_VOLUME_SEGMENTS = 32 backstop is correctly bounded: 32 < 95 (observed evaluator limit on macOS Homebrew ffmpeg) with margin for stricter Linux builds. VOLUME_SIMPLIFY_EPSILON = 0.005 (0.5%) keeps the envelope within ~0.2 dB of the source curve, well under the ~1 dB loudness JND.

Layer 3 degraded retry

hasAutomation guard is correct — won't retry on a plain non-automation failure. The degradedAutomation flag surfaced in MixResult.error means the degradation is observable, not silent. Good observability.

Test coverage

  • audioVolumeEnvelope.test.ts: linear fade, offset, base/tail holds, 5000 keyframes, chunk-ordering, format rejection. Covers all the pure-function invariants.
  • audioMixer.test.ts additions: 300-keyframe expression depth bound (nesting < 32) and layer-3 retry with mock failure. Both test the right invariants.

One note (non-blocking)

interpolateVolumeGain is called once per sample frame (48kHz × 2ch = ~96k times/sec of audio) with a linear scan to find the segment. At 60Hz probe resolution the envelope has at most ~60 × duration points, but for a 10-minute audio clip that's ~3600 envelope points, and the scan re-starts from segment=0 per frame. The incremental-advance pattern from the original sample loop would be faster (O(N+M) vs O(N×M)). At 48kHz × 5min × 3600 points that's ~17B iterations — not a problem in practice for the current use cases, but worth a note for very long audio. Low priority for a follow-up.

Verdict

Solid fix. The root cause diagnosis is correct (measured: 86 kf → OK, 171 kf → exit 234), the chosen approach matches industry practice, the defense layers ensure audio is never dropped, and the test coverage is thorough. The PR body's parity claim was appropriately softened ("matches preview to 60Hz probe resolution") in response to James's note.

LGTM.

— Vai

@miguel-heygen
Copy link
Copy Markdown
Collaborator Author

@abcfy2 preview fix is in 0.6.57! it should work now fine

@abcfy2
Copy link
Copy Markdown

abcfy2 commented May 29, 2026

@miguel-heygen Great! Everything works very well. Thank you for your hard working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants