-
Notifications
You must be signed in to change notification settings - Fork 162
The SDL_Mixer API Redesign Megathread #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One thought, after I posted this: a callback to edit audio before it is mixed only works if the audio isn't changing size. A thing adding reverb, or adjusting the pitch by resampling, will not be able to do it with this method. Something to consider. |
As a note which I'm not sure belongs here or somewhere else. What about positional audio? Stereo panning and full-on surround sound? I would think this could go alongside the volume functions outlined above. Perhaps: // panning going from -1.0f to 1.0f for left to right for example?
void Mix_SetPanning(Mix_Source *src, float panning);
// this would allow to set the position of the audio relative to the listener
void Mix_SetPosition(Mix_Source *src, float x, float y, float z); As I said I'm not sure if this goes here so feel free to ignore or delete if that's the case. |
Some thoughts after reading this:
|
This sounds good, but it seems to be missing an option to decode from an IOStream on the fly (unless that's what the Load* functions do, but by their naming I'd expect them to decode the whole stream into memory first) I had an idea before that mixer could essentially just read from IOStreams (or similar), and then each individual stream could control on their own how to cache and decode audio, etc. With this you could play while decoding on the fly or from predecoded in-memory with the same interface, and it'd allow for mixing in procedurally generated audio, tracker output, etc |
We let them plug an SDL_AudioStream in to a Mix_Source, so they can have procedural and streaming stuff, etc.
"Source" was stolen from OpenAL (although I didn't steal "buffer" for Mix_Audio), but also things that use "source" vs "sink". I don't love the name, and it can definitely change. Everything in audio programming is words that mean 3 different things. :) The first draft of this first draft had no distinction, between Mix_Audio and Mix_Source--it was just Mix_Audio, create it and tell it to play--but you get into a situation where you want to play the laser gun sound in your shmup game a million times in a row, and don't want to create it each time, so separating loading of data from playing of sounds is important.
It's attractive to do so in theory, but in practice it means you have to pass around a context to everything...which is acceptable...but this is going to depend on the SDL3 audio device to do a bunch of heavy lifting: mixing, audiostream management, a worker thread. So I decided to keep it simple: single mixer on a single output device.
They would call Mix_SetSourceMix() to set up a callback. They get their hands on the data from the source right before it is to be mixed in, and can modify it. They could add reverb, etc, here. There's still some question about changing the size of the data, but maybe we can find a way to have the callback carry on past the end of the buffer to write the end of the reverb into what would otherwise be silence. It's an open question still.
There's a "this playing thing has finished" callback, and we can make sure that starting a new sound during that callback means it'll mix in with no gap.
Yeah, that's fine with me, we can do that. Some audio formats might be difficult to seek well, but even there, if it seeks at all, we can get as close to the desired point as necessary and just decode and throw away data until we hit the right sample. |
My thinking (which is not expressed in the OP up there) is that "loading" a thing generally just makes sure it's in memory and we have metadata published and other state ready to go, but the actual decoding happens on playback. This allows compressed things to not eat a ton of RAM but also be ready to go from RAM (instead of disk or whatever) for multiple sources at once. But we should probably have an option to predecode these things, for things where eating the RAM upfront makes sense but also being able to have compression in the on-disk install is also desirable. |
There's also the need to decode the data in advance because some formats are expensive to decode and can't be done just in time to feed the audio device. I'm operating under the assumption that for the most part games want the minimum possible latency so will be feeding the output small chunks at a high rate. |
Hello, I'm a Factorio developer responsible for the sound part of the engine (and the author of the blog post mentioned above). Having just Mix_Audio objects sounds great. Channels being dynamic instead of a pool sounds like a good choice, although personally I don't feel strongly one way or the other. What I really missed for Factorio was a built-in support to change playback speed (of already playing sounds). It couldn't be done using an effect since the output size changes, I customized SDL_Mixer to do a resampling pass just before doing effects pass on each channel. If this will not be the case I would strongly suggest having a built-in support for changing playback speed. Having the ability to plug in SDL_AudioStream is a way to work around that but it means more work for the user. For example in Factorio it's not unheard of to want to have a sound start at random position (so it needs to be seek-able), loop and change it's playback speed often (or all the time even). Having positional audio built-in would be a nice to have but it's not a big deal if it's not there for the sake of keeping the library simple. Last thing related to effects, I find it really useful to have a way to have one effect being applied to large number of channels but not all of them, think applying an effect on all environment sounds while leaving GUI sounds, alerts and music unaffected. Regarding the sample perfect mixing, the ability to plug in SDL_AudioStream solves this and I don't think the "channel finished playing" callback needs to do anything new/special about this. I absolutely agree with having an option to seek by sample, more often than not I find using milliseconds annoying.
I went through all my custom changes to SDL_Mixer and I think that covers it all. Thank you for all the work on SDL! |
Thank you for chiming in! Your blog post was one of the inspirations for wanting to do the redesign and your feedback is really appreciated. :) |
I hope it didn't come across as complaining about the library. Overall I'm really happy with it, I very much appreciate the simplicity, how understandable it is and consequently how easy it is to customize. I'm glad to provide my point of view, clarify my points or contribute directly (although I'm not sure my code would be good enough). I've just remembered that the ability to synchronize starting playing multiple sounds would be great and then realized that it would be possible in the proposed API redesign with Mix_PlayTag(). Cool stuff! So disregard my previous note about the tag system :) |
I know that this kind of question is annoying, but do you have an approximate timeline for this rework? I am working on upgrading Factorio to SDL3 and SDL_Mixer is the largest pain point. I have started working on porting our customizations to the existing SDL3 mixer, but if this redesign will be implemented soon then that work may not be needed. |
@icculus is planning on working on this after the next sdl2-compat update, which will be done soon. |
One idea to improve the applicability and utility of registering an effect is if a buffer that contains more audio samples from earlier in the mix than what the actual mix needs. i.e. include the last mix buffer prefixed to the beginning of the current mix buffer, and pass that to a registered effect callback. This would allow for effects like reverb that doesn't last longer than the length of a buffer. If perhaps it could return an even longer buffer for longer delay/reverb effects to be applied, perhaps a multiplier, like "szHistory" or something that the application can specify. Another idea for variable playback speed is to just include with the Get/Set SourcePlaybackPosition() some functions for setting a multiplier for the speed of the mix, where 1.0 is unscaled and just directly mixing, while anything non-zero is setting whether the MixSource is being upsample/downsampled. i.e. a SetMixSourcePlaybackSpeed() of 0.5 will mix the source at half-speed, upsampling a half-sized buffer to the mix buffer's size. I might have some more thoughts in the coming days. I'm excited about this! :D |
If I could make a suggestion, it would be to use a more intuitive API object naming convention. Perhaps it's time to introduce the concept of tracks, just like in typical audio editing software, e.g. Audacity. So we would have an audio device as the sound output. The device can support a certain number of channels — mono, stereo, etc. Each track is a target for placing sound data and supports as many buffers as channels are allocated (unless the track was intentionally created with fewer channels, e.g. a mono track for a stereo device). For example, if the device was created to render stereo sound, each track has two target buffers (for left and right channels). The user can create as many tracks as the number of sounds they want to mix simultaneously. In short, as proposed by the OP, the device would be called It has been suggested that the sound target be called It was proposed to call the sound source I think that such nomenclature will be as intuitive as possible and will make it easier to use the mixer functionality, especially for those who are not specialists in the field of sound mixing, but more or less understand what they consist of. |
Locally I had changed "Source" to "Input" but "Track" is pretty good. I'll think on it. |
One quick thought re: my specific use case, I notice in the new API there's no equivalent of Maybe an alternative, safer API for that use case might allow passing the raw array along with its bits, sample rate, and number of channels along with the length? Essentially, "here's an array of raw PCM audio, and here are its specs". There would be a slight performance cost of course, but it would make it more flexible to use while also reducing the likelihood of problems due to mismatches between the audio spec and the generated audio buffers. Let me know what you think! |
Just for the record, I second this. I assumed that we'd still have the ability to pass raw PCM data and it just wasn't included in the thread's tentative outline of the new API, but on the off chance that it was intentionally being omitted - it shan't be! |
I'm working on a personal project which requires me to have pretty accurate audio latency measurements. To my dismay, Will more precise reading of the stream position be possible in the redesign? 2025-03-20T23.53.59.716320602-06.00.mp4 |
Okay, so this is not ready yet, but I've pushed what I've got to https://github.com/icculus/SDL_remixer for now. Eventually, that repository go away and we'll drop the files from it into SDL_mixer's main, wire it up to the existing CMake, etc. Right now it's not usable (or even compilable, since I changed a bunch of stuff right before pushing that). I'm about to rough in a very basic .WAV backend just for proof-of-concept purposes, since moving the existing decoders over will take a little more time and I'd like to have something that makes noise sooner than later. |
To catch up on the conversation here:
I have to read back over this whole thread, as I'm sure there's a ton of feedback I haven't addressed yet; I promise I'm not ignoring you! |
Latest in revision control has: // Load raw PCM data to a Mix_Audio from an IOStream.
extern SDL_DECLSPEC Mix_Audio * SDLCALL Mix_LoadRawAudio_IO(SDL_IOStream *io, const SDL_AudioSpec *spec, bool closeio);
// Load raw PCM data to a Mix_Audio. If free_when_done==true, will be SDL_free()'d when the Mix_Audio is destroyed. Otherwise, it's never free'd by SDL_mixer.
extern SDL_DECLSPEC Mix_Audio * SDLCALL Mix_LoadRawAudio(const void *data, size_t datalen, const SDL_AudioSpec *spec, bool free_when_done); |
There's been a ton more work on this. We have a Wav, Ogg, MP3, and AIFF decoders hooked up, lots of metadata improvements, and, of course, a working mixer. I've started using the issue tracker as a TODO list, to give you an idea of where we are: https://github.com/icculus/SDL_remixer/issues This is at the point where people can start experimenting with it, although there are a few genuinely janky pieces and there will be changes as feedback comes in. |
Also: do we want duplicate decoders? Is it worth having both a libvorbis decoder and an stb_vorbis decoder? Likewise for dr_mp3/mpg123, dr_flac/libFLAC, and Timidity/FluidSynth. |
Yes!!! I'm excited. Thank you for all of the work you're doing. I started a project 6-7 months ago using SDL3 after using SDL2 since it was released, and SDL before that for several years. I was counting on SDL_mixer 3.0 coming around when the project was ready for audio to be added in - and if not then I would've written my own mixing ontop of SDL_audio as a last resort.
While I do agree that it seems somewhat redundant to have different libraries available for decoding, I do think that it might be best to allow developers to pick and choose which exact ones they want to employ in their wares. That's my two cents. Maybe just make sure the documentation explains that different audio types have different numbers of supported decoders available that they can choose from - so they don't feel like they're just staring at a big list of decoders they don't understand the purpose of. Classify them by the audio type(s) they support. Cheers! :] |
Here's my first pitch at a redesigned SDL_mixer API. I took the same approach as SDL_net, where all of it goes in the trash and we start from scratch.
The idea is this:
This API spec is extremely terse, so don't be afraid to ask questions. Obviously nothing is locked down at this point, so feedback is definitely welcome.
EDIT: This initial proposal is out of date, so I've hidden it until you click through.
The text was updated successfully, but these errors were encountered: