Add 24-bit integer and 32-bit floating-point PCM .wav input/output support#56
Add 24-bit integer and 32-bit floating-point PCM .wav input/output support#56mvdirty wants to merge 11 commits intoServeurpersoCom:masterfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 47 minutes and 53 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis pull request introduces a generalized WAV format selection mechanism across the codebase. New enums and APIs in Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI Parser
participant Validator as Format Validator
participant Writer as Audio Writer
participant Encoder as WAV Encoder
participant File as Output File
CLI->>Validator: Parse --output or --wav-format
Validator->>Validator: Validate format string
alt Format Valid
Validator-->>CLI: Return AudioFileFormat/WavFormat
CLI->>Writer: Call audio_write() with format
Writer->>Writer: Check should_normalize_audio()
alt Normalization Required
Writer->>Writer: Normalize audio
end
Writer->>Encoder: Call audio_encode_wav(format)
Encoder->>Encoder: Dispatch on format (S16/S24/F32)
alt Format == S16 or S24
Encoder->>Encoder: Clamp [-1,1]
Encoder->>Encoder: Quantize to PCM
else Format == F32
Encoder->>Encoder: Sanitize NaN/Inf
Encoder->>Encoder: Emit IEEE-754
end
Encoder->>File: Write WAV with format header
else Format Invalid
Validator-->>CLI: Error
CLI->>CLI: Exit with status 1
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/audio-io.h`:
- Around line 591-623: The two header-defined functions
parse_optional_wav_format and should_normalize_wav_audio are non-inline
definitions included in multiple translation units causing ODR violations; mark
both functions as inline in the header (i.e., add the inline specifier to the
declarations/definitions of parse_optional_wav_format(const char*, WavFormat&)
and should_normalize_wav_audio(WavFormat)) so the linker will allow identical
definitions across TUs.
- Around line 382-400: The functions wav_clamp1, wav_quantize_s16,
wav_quantize_s24 and wav_write_f32le must first coerce NaN and infinities to
0.0f to match the documented contract; add a small sanitization helper (e.g.
float sanitize_f32(float x)) that returns 0.0f for !std::isfinite(x) and
otherwise returns x, then call it from wav_clamp1 (or make wav_clamp1 call
sanitize_f32 internally) so NaN/Inf are not left unchanged, call sanitize_f32
before quantization in wav_quantize_s16 and wav_quantize_s24 to avoid undefined
conversion, and call sanitize_f32 in wav_write_f32le before memcpy so no raw
NaN/Inf bits are written. Ensure you include <cmath> and use std::isfinite for
the check.
In `@src/wav.h`:
- Around line 107-143: The code only handles 24-bit as extensible PCM and 32-bit
float only as legacy tag 3; update the conditional checks to also accept classic
PCM24 (handle when audio_format == 1 && bits_per_sample == 24 using the existing
PCM24 branch logic) and extensible float32 (handle when audio_format == 0xfffe
&& extensible_subformat == 3 && bits_per_sample == 32 using the existing float32
branch logic). Modify the if/else-if conditions around the PCM24 block
(currently testing audio_format == 0xfffe && bits_per_sample == 24 &&
extensible_subformat == 1) and the float32 block (currently audio_format == 3 &&
bits_per_sample == 32) to include the alternative combinations, and reuse
wav_read_s24le, wav_read_f32le, n_samples, n_channels, data, pos, data_bytes and
audio allocation/assignment code so pointer arithmetic and malloc size remain
identical.
In `@tools/neural-codec.cpp`:
- Around line 319-321: Update the help string that documents the "--wav-format"
option in tools/neural-codec.cpp to explicitly state that "--wav-format" is only
applicable when using "--decode" (i.e., it controls the WAV output format for
decode mode only), so the CLI help matches the runtime validation performed for
"--wav-format" (which rejects the option unless "--decode" is set). Reference
the "--wav-format" and "--decode" flags when editing the usage text so users see
the restriction up front.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8903395c-c1c7-4119-b91b-a9a8a01ec678
📒 Files selected for processing (8)
README.mddocs/ARCHITECTURE.mdsrc/audio-io.hsrc/wav.htools/ace-server.cpptools/ace-synth.cpptools/mp3-codec.cpptools/neural-codec.cpp
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/audio-io.h (1)
616-628: Consider adding an unreachable hint to silence compiler warnings.Some compilers may warn that control reaches end of non-void function even though all enum cases are covered and
std::terminate()is called. Adding a compiler-specific unreachable hint or restructuring with a default case could silence these warnings, though this is a minor style consideration.Optional: Add unreachable annotation
inline bool should_normalize_wav_audio(WavFormat wav_format) { switch (wav_format) { case WAV_FORMAT_PCM_S16: return true; case WAV_FORMAT_PCM_S24: return true; case WAV_FORMAT_IEEE_F32: return false; + default: + break; } assert(false && "unsupported wav_format"); std::terminate(); +#if defined(__GNUC__) || defined(__clang__) + __builtin_unreachable(); +#elif defined(_MSC_VER) + __assume(0); +#endif }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/audio-io.h` around lines 616 - 628, The switch in should_normalize_wav_audio covers all WavFormat cases but some compilers still warn about control reaching end of a non-void function; update the function to provide an explicit unreachable hint after the assert/terminate (or use a default case that asserts) to silence warnings. Locate should_normalize_wav_audio and either add a default: path that calls assert(false && "unsupported wav_format") and then a platform hint such as __builtin_unreachable() (or ::_assume(false) on MSVC) or, after the existing std::terminate(), append a compiler unreachable intrinsic so the compiler knows the code path cannot return; keep the existing enum case logic for WAV_FORMAT_PCM_S16, WAV_FORMAT_PCM_S24, and WAV_FORMAT_IEEE_F32.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/audio-io.h`:
- Around line 616-628: The switch in should_normalize_wav_audio covers all
WavFormat cases but some compilers still warn about control reaching end of a
non-void function; update the function to provide an explicit unreachable hint
after the assert/terminate (or use a default case that asserts) to silence
warnings. Locate should_normalize_wav_audio and either add a default: path that
calls assert(false && "unsupported wav_format") and then a platform hint such as
__builtin_unreachable() (or ::_assume(false) on MSVC) or, after the existing
std::terminate(), append a compiler unreachable intrinsic so the compiler knows
the code path cannot return; keep the existing enum case logic for
WAV_FORMAT_PCM_S16, WAV_FORMAT_PCM_S24, and WAV_FORMAT_IEEE_F32.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2a51e17a-2cb3-48af-af1b-8e3b73f7a48b
📒 Files selected for processing (5)
docs/ARCHITECTURE.mdsrc/audio-io.htools/ace-synth.cpptools/mp3-codec.cpptools/neural-codec.cpp
✅ Files skipped from review due to trivial changes (2)
- docs/ARCHITECTURE.md
- tools/mp3-codec.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- tools/ace-synth.cpp
|
Hey, nice work on the audio encoding side. The endian-safe helpers, WAVE_FORMAT_EXTENSIBLE for 24-bit, the RIFF padding fix, wav_sanitize/clamp, and the quantization math are all solid. I want to merge the core audio code. However I'd like to simplify the format selection architecture. Instead of --wav + --wav-format as two separate flags with cross-validation in every binary, I'm thinking of a single --output flag: This removes all the "ERROR: --wav-format requires usage of --wav" validation, and on the server side it becomes a per-request URL parameter ?output=fp32 instead of a global --wav-format flag, which is better because different clients may want different formats. Could you rebase on current master and rework the CLI/server plumbing around this? The audio-io.h and wav.h core is good to go as is, it's just the argument parsing and server integration that needs the redesign. |
d3a213a to
0c2e497
Compare
...
I have:
I have yet to:
|
|
Looks like I've made a couple of dumb errors (one in code, and another that is isolated my local ace-synth test script setup.) I have fixed the code and my scripts and am re-testing now. |
|
Since MP3 is the default and most widely used format, there's no need to ensure backward compatibility with the |
|
You can also remove the long descriptions regarding binary help formats to maintain the original short style (no indentation in help); The names of types you used are clean and self-explanatory. For long description we'll add them to the documentation :) Afterwards, I'll do a review with notes in the code, and it will be ready for merge. It's a truly clean WAV format support :) |
|
Testing of the updates is now complete, via both |
…pport to ace-server, ace-synth, mp3-codec, and neural-codec', with automatic disabling of normalization and peak clipping when outputting 32-bit floating point .wav audio.
(Also kill a bit of new-but-newly-dead code)
29c0eb2 to
677dd00
Compare
|
Addressed latest conflict and force-pushed. |
|
I will now address the removal of |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tools/ace-synth.cpp (1)
106-110: Consider dropping the legacy--wavalias.
--output wavalready covers this case, and keeping--wavleaves a second parser path plus an extra precedence rule to maintain. Since backward compatibility was explicitly called out as unnecessary here, removing the alias would simplify both the parser and the help text.Also applies to: 137-141
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/ace-synth.cpp` around lines 106 - 110, The command-line parser in tools/ace-synth.cpp contains a legacy alias branch that handles "--wav" (seen in the argv parsing block comparing argv[i] to "--wav") which duplicates functionality already covered by "--output wav"; remove the entire conditional branch that checks for "--wav" (and the related duplicate handling around lines handling the same flag at 137-141) so parsing always uses the "--output" path, and update any help/usage text to drop the "--wav" alias reference; keep handling of audio_file_format via AUDIO_FILE_FORMAT_WAV_S16 only when parsing "--output" values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/wav.h`:
- Around line 89-110: The malloc() result for the audio buffer is not checked,
so if allocation fails subsequent writes to audio (in the PCM16, PCM24 and
float32 branches where audio is assigned) will dereference NULL; after each
allocation (the assignments to audio in the branches handling bits_per_sample ==
16, == 24 and float32), check that audio != NULL and on failure perform the
function's existing error handling/cleanup path (e.g., free any allocated state,
set n_samples to 0 or an error code, and return or propagate the error) to avoid
a crash; update the branches that set n_samples and audio (the blocks using
audio = (float *) malloc(...), p = data + pos, and the loops that write audio[])
to bail out if malloc returned NULL.
In `@tools/neural-codec.cpp`:
- Around line 397-405: The decode path currently allows MP3 output because
audio_write() selects MP3 when output_path ends with ".mp3", so --decode (-mode
== 1) with -o out.mp3 will produce MP3 and ignore --wav-format; to fix, enforce
WAV-only output in decode mode by validating/normalizing output_path (or
rejecting non-.wav values) when mode == 1: update the checks around mode,
wav_format_str, parse_wav_format and the code that sets/uses output_path before
calling audio_write() (referencing mode, wav_format_str, parse_wav_format,
output_path, and audio_write) so that decode mode either forces a .wav extension
or errors out for non-WAV outputs, and mirror the same change at the other
occurrence noted (lines ~497-498).
---
Nitpick comments:
In `@tools/ace-synth.cpp`:
- Around line 106-110: The command-line parser in tools/ace-synth.cpp contains a
legacy alias branch that handles "--wav" (seen in the argv parsing block
comparing argv[i] to "--wav") which duplicates functionality already covered by
"--output wav"; remove the entire conditional branch that checks for "--wav"
(and the related duplicate handling around lines handling the same flag at
137-141) so parsing always uses the "--output" path, and update any help/usage
text to drop the "--wav" alias reference; keep handling of audio_file_format via
AUDIO_FILE_FORMAT_WAV_S16 only when parsing "--output" values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79f32bb5-b8ef-489a-936a-9e60f034d58e
📒 Files selected for processing (8)
docs/ARCHITECTURE.mdexamples/lego.shsrc/audio-io.hsrc/wav.htools/ace-server.cpptools/ace-synth.cpptools/mp3-codec.cpptools/neural-codec.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/ARCHITECTURE.md
- tools/ace-server.cpp
| n_samples = (int) (data_bytes / ((size_t) n_channels * 2)); | ||
| audio = (float *) malloc((size_t) n_samples * 2 * sizeof(float)); | ||
| const short * pcm = (const short *) (data + pos); | ||
| const uint8_t * p = data + pos; | ||
|
|
||
| for (int t = 0; t < n_samples; t++) { | ||
| if (n_channels == 1) { | ||
| float s = (float) pcm[t] / 32768.0f; | ||
| audio[t * 2 + 0] = s; | ||
| audio[t * 2 + 1] = s; | ||
| int16_t s = (int16_t) wav_read_u16le(p + t * 2); | ||
| float f = (float) s / 32768.0f; | ||
| audio[t * 2 + 0] = f; | ||
| audio[t * 2 + 1] = f; | ||
| } else { | ||
| audio[t * 2 + 0] = (float) pcm[t * n_channels + 0] / 32768.0f; | ||
| audio[t * 2 + 1] = (float) pcm[t * n_channels + 1] / 32768.0f; | ||
| const uint8_t * frame = p + (size_t) t * n_channels * 2; | ||
| int16_t l = (int16_t) wav_read_u16le(frame + 0); | ||
| int16_t r = (int16_t) wav_read_u16le(frame + 2); | ||
| audio[t * 2 + 0] = (float) l / 32768.0f; | ||
| audio[t * 2 + 1] = (float) r / 32768.0f; | ||
| } | ||
| } | ||
| } else if (audio_format == 0xfffe && bits_per_sample == 24 && extensible_subformat == 1) { | ||
| n_samples = (int) (data_bytes / ((size_t) n_channels * 3)); | ||
| audio = (float *) malloc((size_t) n_samples * 2 * sizeof(float)); | ||
| const uint8_t * p = data + pos; |
There was a problem hiding this comment.
Check malloc() before the first sample write.
Each decode branch stores into audio immediately after allocation. If malloc() fails, the first write at Line 97 / Line 116 / Line 134 dereferences NULL and turns a recoverable read failure into a crash.
Suggested guard
n_samples = (int) (data_bytes / ((size_t) n_channels * 2));
audio = (float *) malloc((size_t) n_samples * 2 * sizeof(float));
+ if (!audio) {
+ fprintf(stderr, "[WAV] Out of memory while decoding\n");
+ return NULL;
+ }
const uint8_t * p = data + pos;Apply the same check after the PCM24 and float32 allocations as well.
Also applies to: 127-129
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/wav.h` around lines 89 - 110, The malloc() result for the audio buffer is
not checked, so if allocation fails subsequent writes to audio (in the PCM16,
PCM24 and float32 branches where audio is assigned) will dereference NULL; after
each allocation (the assignments to audio in the branches handling
bits_per_sample == 16, == 24 and float32), check that audio != NULL and on
failure perform the function's existing error handling/cleanup path (e.g., free
any allocated state, set n_samples to 0 or an error code, and return or
propagate the error) to avoid a crash; update the branches that set n_samples
and audio (the blocks using audio = (float *) malloc(...), p = data + pos, and
the loops that write audio[]) to bail out if malloc returned NULL.
|
I have addressed the feedback provided by @ServeurpersoCom |
Based on core audio work from PR #56. wav.h: endian-safe reader with 24-bit WAVE_FORMAT_EXTENSIBLE support, odd-chunk padding, byte-level reads instead of memcpy+cast. audio-io.h: three WAV encoders (s16, s24, IEEE f32) with endian-safe byte writes, NaN/Inf sanitization, and per-format clamping. Single WavFormat enum, all public APIs default to WAV_S16 for backwards compatibility. WAV_F32 skips normalization to preserve full range. No CLI changes yet (--format plumbing is next). Co-authored-by: mvdirty <[email protected]>
|
I redid the piping and the nitpicks myself. It's now merged. |
This PR adds 24-bit integer and 32-bit floating-point PCM .wav input/output support.
Input support is added to all .wav-file input execution paths.
Output support is added to ace-server, ace-synth, mp3-codec, and neural-codec.
When outputting 32-bit floating-point .wav audio, normalization and peak clipping are automatically disabled in output execution paths which normally include normalization and peak clipping. Currently, these:
Testing has been performed primarily via ace-server, with input/output tests using all supported input and output formats. Secondary testing was performed via ace-understand and ace-synth, using various scripts in the examples folder which were edited to apply --wav and --wav-format appropriately.
Summary by CodeRabbit
Release Notes
New Features
--output <format>option to ace-synth supporting mp3, wav, wav16, wav24, and wav32 output formats.--wav-formatoption to neural-codec and mp3-codec for WAV output format selection.Documentation