Skip to content

The SDL_Mixer API Redesign Megathread #662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
icculus opened this issue Jan 30, 2025 · 24 comments
Open

The SDL_Mixer API Redesign Megathread #662

icculus opened this issue Jan 30, 2025 · 24 comments

Comments

@icculus
Copy link
Collaborator

icculus commented Jan 30, 2025

Here's my first pitch at a redesigned SDL_mixer API. I took the same approach as SDL_net, where all of it goes in the trash and we start from scratch.

The idea is this:

  • You no longer have a separation between "chunks" and "music".
  • Channels are replaced by "Mix_Source" objects. They are dynamic, you don't have a set block of channels.
  • You load sound data into "Mix_Audio" objects, which can be used on multiple Mix_Sources at once.
  • Most of the functionality of the existing library is still here, possibly in a better form. Gone is the "effects" API, because all of that code sucked, and was generally limited to stereo audio anyhow. In its place, there are hooks to edit audio as it is about to be mixed, where any effect could be implemented, and the ability to wire up an SDL_AudioStream to a Mix_Source, so you can put any dynamic audio into the mixer that you want, as just another source to be mixed. The post-mix callback remains, too, which can be used for what the current library calls "posteffects."

This API spec is extremely terse, so don't be afraid to ask questions. Obviously nothing is locked down at this point, so feedback is definitely welcome.

EDIT: This initial proposal is out of date, so I've hidden it until you click through.
// there is no separate "init" function. You open the audio device (presumably just the default audio device) and go.

bool Mix_OpenAudio(SDL_AudioDeviceID devid);  // this will call SDL_Init(SDL_INIT_AUDIO), open audio device.
void Mix_CloseAudio(void);  // this will call SDL_QuitSubSystem(SDL_INIT_AUDIO).


int Mix_GetNumAudioDecoders(void);
const char *Mix_GetAudioDecoder(int index);  // "WAV", "MP3", etc.

bool Mix_QuerySpec(SDL_AudioSpec *spec);   // what the device is actually expecting.

// there is no difference between sounds and "music" now. They're all Mix_Audio objects.
Mix_Audio *Mix_LoadAudio_IO(SDL_IOStream *src, bool closeio);
Mix_Audio *Mix_LoadAudio(const char *path);
Mix_Audio *Mix_LoadAudioWithProperties(SDL_PropertiesID props);  // lets you specify things like "here's a path to MIDI instrument files outside of this file", etc.

SDL_PropertiesID Mix_GetAudioProperties(Mix_Audio *audio);  // we can store audio format-specific metadata in here (artist/album/etc info...)

void Mix_DestroyAudio(Mix_Audio *audio);  // reference-counted; if this is playing, it will be _actually_ destroyed when no longer in use.


// Sources are your "channels" but they aren't static anymore. Just make as
// many as you like and destroy them as you like. If you want the old
// semantics, just make as many as you would have allocated "channels" and put
// them in an array somewhere.

Mix_Source *Mix_CreateSource(void);
void Mix_DestroySource(Mix_Source *src);  // will halt playback, if playing. Won't call Finished callback, though. We assume you know.

bool Mix_SetSourceAudio(Mix_Source *src, Mix_Audio *audio);  // Source will replace current audio with new one. If currently playing, will start playing new audio immediately.
bool Mix_SetSourceAudioStream(Mix_Source *src, SDL_AudioStream *stream);  // insert anything you like into the mix. procedural audio, VoIP, data right from a microphone, etc. Will pull from AudioStream as needed instead of a Mix_Audio.

bool Mix_TagSource(Mix_Source *src, const char *tag);  // add an arbitrary tag to a Mix_Source. You can group audio this way. A Mix_Source can have multiple tags.
void Mix_UntagSource(Mix_Source *src, const char *tag);  // remove an arbitrary tag from a Mix_Source.

bool Mix_SetSourcePlaybackPosition(Mix_Source *src, Uint64 position);  // set source playback position to X milliseconds in. Must be fed from a Mix_Audio that can seek, other limitations.
Uint64 Mix_GetSourcePlaybackPosition(Mix_Source *audio);  // milliseconds of audio that have been played from the start of this Mix_Source.


// operations that deal with actual mixing/playback...

// play a Mix_Source.
// if (fireAndForget) the Mix_Source is destroyed if it halts for any reason.
// if (maxTicks >= 0), it halts/loops after X milliseconds of playback.
// if (loops >= 0), it loops this many times then halts (so 0==play once, 1==play twice). if < 0, loop forever.
// if (fadeIn > 0), it fades in from silence over X milliseconds. If looping, only first iteration fades in.
bool Mix_PlaySource(Mix_Source *src, bool fireAndForget, Sint64 maxTicks, int loops, Sint64 fadeIn);
bool Mix_PlayTag(const char *tag, bool fireAndForget, Sint64 maxTicks, int loops, Sint64 fadeIn);  // play everything with this tag.

// halt playing audio. If (fadeOut > 0), fade out over X milliseconds before halting. if <= 0, halt immediately.
bool Mix_HaltSource(Mix_Source *src, Sint64 fadeOut);  // halt a playing Mix_Source. No-op if not playing.
bool Mix_HaltAllSources(Sint64 fadeOut);  // halt anything that's playing.
bool Mix_HaltTag(const char *tag, Sint64 fadeOut);  // halt all playing Mix_Sources with a matching tag.

// Pausing is not halting (so no finished callback, fire-and-forget sources don't destruct, resuming doesn't rewind audio to start).
bool Mix_PauseSource(Mix_Source *src);  // pause a playing Mix_Source. No-op if not playing.
bool Mix_PauseAllSources(Sint64 fadeOut);  // pause anything that's playing.
bool Mix_PauseTag(const char *tag, Sint64 fadeOut);  // pause all playing Mix_Sources with a matching tag.

// Resuming is the opposite of pausing. You can't resume a source that isn't paused.
bool Mix_ResumeSource(Mix_Source *src);  // resume a playing Mix_Source. No-op if not paused.
bool Mix_ResumeAllSources(Sint64 fadeOut);  // resume anything that's playing.
bool Mix_ResumeTag(const char *tag, Sint64 fadeOut);  // resume all playing Mix_Sources with a matching tag.

bool Mix_Playing(Mix_Source *src);  // true if source is playing.
bool Mix_Paused(Mix_Source *src);  // true if source is paused.

bool Mix_SetFinishedCallback(Mix_Source *src, Mix_SourceFinishedCallback cb, void *userdata);  // if set, is called when a src halts for any reason except destruction.


// volume control...

void Mix_SetMasterGain(float gain);  // one knob that adjusts all playing sounds. Modulates with per-Mix_Source gain.
float Mix_GetMasterGain(void);

void Mix_SetGain(Mix_Source *src, float gain);  // Change gain for this one Mix_Source.
void Mix_GetGain(Mix_Source *src, float gain);
void Mix_SetTagGain(const char *tag, float gain);  // Change gain for all Mix_Sources with this tag.


// hooks...

void Mix_SetPostMix(SDL_AudioPostmixCallback mix_func, void *userdata);  // just calls the standard SDL postmix callback.
void Mix_SetSourceMix(Mix_Source *src, Mix_SourceMixCallback cb, void *userdata);  // is called as data is to be mixed, so you can view (and edit) the source's data. Always in float32 format!
@icculus
Copy link
Collaborator Author

icculus commented Jan 30, 2025

One thought, after I posted this: a callback to edit audio before it is mixed only works if the audio isn't changing size. A thing adding reverb, or adjusting the pitch by resampling, will not be able to do it with this method.

Something to consider.

@bXi
Copy link

bXi commented Jan 30, 2025

As a note which I'm not sure belongs here or somewhere else.

What about positional audio? Stereo panning and full-on surround sound? I would think this could go alongside the volume functions outlined above.

Perhaps:

// panning going from -1.0f to 1.0f for left to right for example?
void Mix_SetPanning(Mix_Source *src, float panning);

// this would allow to set the position of the audio relative to the listener
void Mix_SetPosition(Mix_Source *src, float x, float y, float z);

As I said I'm not sure if this goes here so feel free to ignore or delete if that's the case.

@slouken
Copy link
Collaborator

slouken commented Jan 30, 2025

Some thoughts after reading this:

  • Is Mix_Audio a public interface? It seems like it would be useful to allow people to create custom audio sources (e.g. procedural audio track, custom audio format, etc.) and be able to feed data into the mixer.
  • Mix_Source is confusing naming, I would have thought just by reading that it would be the audio chunk, the source of audio data. Reading it through some more I see why you do that, but maybe it makes sense to eliminate the distinction between Mix_Audio and Mix_Source? e.g. Mix_CreateIOSource(), Mix_CreateCallbackSource(), Mix_CreateAudioStreamSource(), etc. and then you can directly set gain and playback position on the source? That may complicate the interface that someone would have to implement for a custom source, so I'm not sure it's a good idea, but...
  • Does it make sense to have the mixer entirely separate from the audio subsystem, and it can output an audio stream? That way you could have it output to disk, bound to an audio device, or usable in a library that creates a procedural soundscape from multiple sources that then gets mixed into the game audio output?
  • How would an effects library interact with this? People will definitely still want to have reverb, positional sound, etc., and if it's not implemented in the core library we should make it easy to add.
  • One thing that came up in the Factorio blog post was the desire to have sample perfect mixing, so attenuation happens across the entire buffer and starting a new sound when an old sound stops happens without any audio gap. That should be a goal of the new API.
  • Does it make sense to add the ability to query and set the source position by sample instead of milliseconds? Even at 48kHz a millisecond is 48 samples and if you're doing seamless audio you can't set the precise sample position using those units. Even if you use higher resolution like microseconds, a microsecond is 0.048 of a sample, so you can't get a specific sample offset using microseconds in the general case.

@maia-s
Copy link

maia-s commented Jan 30, 2025

This sounds good, but it seems to be missing an option to decode from an IOStream on the fly (unless that's what the Load* functions do, but by their naming I'd expect them to decode the whole stream into memory first)

I had an idea before that mixer could essentially just read from IOStreams (or similar), and then each individual stream could control on their own how to cache and decode audio, etc. With this you could play while decoding on the fly or from predecoded in-memory with the same interface, and it'd allow for mixing in procedurally generated audio, tracker output, etc

@icculus
Copy link
Collaborator Author

icculus commented Jan 31, 2025

Is Mix_Audio a public interface? It seems like it would be useful to allow people to create custom audio sources (e.g. procedural audio track, custom audio format, etc.) and be able to feed data into the mixer.

We let them plug an SDL_AudioStream in to a Mix_Source, so they can have procedural and streaming stuff, etc.

Mix_Source is confusing naming, I would have thought just by reading that it would be the audio chunk, the source of audio data. Reading it through some more I see why you do that, but maybe it makes sense to eliminate the distinction between Mix_Audio and Mix_Source? e.g. Mix_CreateIOSource(), Mix_CreateCallbackSource(),

"Source" was stolen from OpenAL (although I didn't steal "buffer" for Mix_Audio), but also things that use "source" vs "sink". I don't love the name, and it can definitely change. Everything in audio programming is words that mean 3 different things. :)

The first draft of this first draft had no distinction, between Mix_Audio and Mix_Source--it was just Mix_Audio, create it and tell it to play--but you get into a situation where you want to play the laser gun sound in your shmup game a million times in a row, and don't want to create it each time, so separating loading of data from playing of sounds is important.

Does it make sense to have the mixer entirely separate from the audio subsystem, and it can output an audio stream? That way you could have it output to disk, bound to an audio device, or usable in a library that creates a procedural soundscape from multiple sources that then gets mixed into the game audio output?

It's attractive to do so in theory, but in practice it means you have to pass around a context to everything...which is acceptable...but this is going to depend on the SDL3 audio device to do a bunch of heavy lifting: mixing, audiostream management, a worker thread. So I decided to keep it simple: single mixer on a single output device.

How would an effects library interact with this? People will definitely still want to have reverb, positional sound, etc., and if it's not implemented in the core library we should make it easy to add.

They would call Mix_SetSourceMix() to set up a callback. They get their hands on the data from the source right before it is to be mixed in, and can modify it. They could add reverb, etc, here. There's still some question about changing the size of the data, but maybe we can find a way to have the callback carry on past the end of the buffer to write the end of the reverb into what would otherwise be silence. It's an open question still.

One thing that came up in the Factorio blog post was the desire to have sample perfect mixing, so attenuation happens across the entire buffer and starting a new sound when an old sound stops happens without any audio gap. That should be a goal of the new API.

There's a "this playing thing has finished" callback, and we can make sure that starting a new sound during that callback means it'll mix in with no gap.

Does it make sense to add the ability to query and set the source position by sample instead of milliseconds? Even at 48kHz a millisecond is 48 samples and if you're doing seamless audio you can't set the precise sample position using those units. Even if you use higher resolution like microseconds, a microsecond is 0.048 of a sample, so you can't get a specific sample offset using microseconds in the general case.

Yeah, that's fine with me, we can do that. Some audio formats might be difficult to seek well, but even there, if it seeks at all, we can get as close to the desired point as necessary and just decode and throw away data until we hit the right sample.

@icculus
Copy link
Collaborator Author

icculus commented Jan 31, 2025

This sounds good, but it seems to be missing an option to decode from an IOStream on the fly (unless that's what the Load* functions do, but by their naming I'd expect them to decode the whole stream into memory first)

My thinking (which is not expressed in the OP up there) is that "loading" a thing generally just makes sure it's in memory and we have metadata published and other state ready to go, but the actual decoding happens on playback. This allows compressed things to not eat a ton of RAM but also be ready to go from RAM (instead of disk or whatever) for multiple sources at once.

But we should probably have an option to predecode these things, for things where eating the RAM upfront makes sense but also being able to have compression in the on-disk install is also desirable.

@slouken
Copy link
Collaborator

slouken commented Jan 31, 2025

But we should probably have an option to predecode these things, for things where eating the RAM upfront makes sense but also being able to have compression in the on-disk install is also desirable.

There's also the need to decode the data in advance because some formats are expensive to decode and can't be done just in time to feed the audio device. I'm operating under the assumption that for the most part games want the minimum possible latency so will be feeding the output small chunks at a high rate.

@Donione
Copy link

Donione commented Feb 11, 2025

Hello, I'm a Factorio developer responsible for the sound part of the engine (and the author of the blog post mentioned above).
I'll share my thoughts about SDL_Mixer API redesign from that point of view.

Having just Mix_Audio objects sounds great.
For Factorio I've ended up customizing SDL_Mixer to allow multiple "music" channels, treating them basically the same as "chunks", the difference would be how they are loaded/decoded. "Music" would live in memory or on disk and decoded on-the-fly in pieces, "chunks" would be loaded and decoded whole and then played.
Having just one object and the ability to control when the decode happens would make things simpler.

Channels being dynamic instead of a pool sounds like a good choice, although personally I don't feel strongly one way or the other.

What I really missed for Factorio was a built-in support to change playback speed (of already playing sounds). It couldn't be done using an effect since the output size changes, I customized SDL_Mixer to do a resampling pass just before doing effects pass on each channel.
If the Mix_SetSourceMix callback returns how many samples it actually outputted and the mixer handles it well, then I don't see a problem with leaving playback speed changes, positional audio, reverb and such to the user.
(Note: I imagine when the callback would output less data than the input had we could just mix in more input data (and call the callback again) until there is enough; when more data are outputted the mixer would need to store the extra data for the next mixing pass.)

If this will not be the case I would strongly suggest having a built-in support for changing playback speed.
(Edit: actually, even if it is the case, having it built-in would be so nice)

Having the ability to plug in SDL_AudioStream is a way to work around that but it means more work for the user. For example in Factorio it's not unheard of to want to have a sound start at random position (so it needs to be seek-able), loop and change it's playback speed often (or all the time even).

Having positional audio built-in would be a nice to have but it's not a big deal if it's not there for the sake of keeping the library simple.

Last thing related to effects, I find it really useful to have a way to have one effect being applied to large number of channels but not all of them, think applying an effect on all environment sounds while leaving GUI sounds, alerts and music unaffected.
For that in Factorio I've added a "mid-mix effect", practically a copy-paste of the post-mix effect, only somewhere in the middle of channels instead of at the end. I could have attached the same effect on each of those channels but it seems like a waste when having 100+ sounds.

Regarding the sample perfect mixing, the ability to plug in SDL_AudioStream solves this and I don't think the "channel finished playing" callback needs to do anything new/special about this.

I absolutely agree with having an option to seek by sample, more often than not I find using milliseconds annoying.
Although where it really mattered was the procedural stuff and there I ended up doing my custom stuff anyway. So seeking by sample would be a nice to have.

I've never found the tag (group) system useful, for simple sound playback it seems like a needless complication and it doesn't go far enough to enable some more complicated things (when I started moving Factorio to SDL_Mixer I hoped I could use the group system to emulate, to some extent, a mixer hierarchy). But that's just my personal experience.
Edit: I just realized the usefulness for Mix_PlayTag()

I went through all my custom changes to SDL_Mixer and I think that covers it all.

Thank you for all the work on SDL!

@slouken
Copy link
Collaborator

slouken commented Feb 11, 2025

Hello, I'm a Factorio developer responsible for the sound part of the engine (and the author of the blog post mentioned above).

Thank you for chiming in! Your blog post was one of the inspirations for wanting to do the redesign and your feedback is really appreciated. :)

@Donione
Copy link

Donione commented Feb 12, 2025

Your blog post was one of the inspirations for wanting to do the redesign and your feedback is really appreciated. :)

I hope it didn't come across as complaining about the library. Overall I'm really happy with it, I very much appreciate the simplicity, how understandable it is and consequently how easy it is to customize.

I'm glad to provide my point of view, clarify my points or contribute directly (although I'm not sure my code would be good enough).


I've just remembered that the ability to synchronize starting playing multiple sounds would be great and then realized that it would be possible in the proposed API redesign with Mix_PlayTag(). Cool stuff! So disregard my previous note about the tag system :)

@raiguard
Copy link

raiguard commented Feb 21, 2025

I know that this kind of question is annoying, but do you have an approximate timeline for this rework? I am working on upgrading Factorio to SDL3 and SDL_Mixer is the largest pain point. I have started working on porting our customizations to the existing SDL3 mixer, but if this redesign will be implemented soon then that work may not be needed.

@slouken
Copy link
Collaborator

slouken commented Feb 21, 2025

do you have an approximate timeline for this rework?

@icculus is planning on working on this after the next sdl2-compat update, which will be done soon.

@DEF7
Copy link

DEF7 commented Mar 4, 2025

One idea to improve the applicability and utility of registering an effect is if a buffer that contains more audio samples from earlier in the mix than what the actual mix needs. i.e. include the last mix buffer prefixed to the beginning of the current mix buffer, and pass that to a registered effect callback. This would allow for effects like reverb that doesn't last longer than the length of a buffer. If perhaps it could return an even longer buffer for longer delay/reverb effects to be applied, perhaps a multiplier, like "szHistory" or something that the application can specify.

Another idea for variable playback speed is to just include with the Get/Set SourcePlaybackPosition() some functions for setting a multiplier for the speed of the mix, where 1.0 is unscaled and just directly mixing, while anything non-zero is setting whether the MixSource is being upsample/downsampled. i.e. a SetMixSourcePlaybackSpeed() of 0.5 will mix the source at half-speed, upsampling a half-sized buffer to the mix buffer's size.

I might have some more thoughts in the coming days. I'm excited about this! :D

@flowCRANE
Copy link

If I could make a suggestion, it would be to use a more intuitive API object naming convention. Perhaps it's time to introduce the concept of tracks, just like in typical audio editing software, e.g. Audacity.

So we would have an audio device as the sound output. The device can support a certain number of channels — mono, stereo, etc. Each track is a target for placing sound data and supports as many buffers as channels are allocated (unless the track was intentionally created with fewer channels, e.g. a mono track for a stereo device). For example, if the device was created to render stereo sound, each track has two target buffers (for left and right channels). The user can create as many tracks as the number of sounds they want to mix simultaneously.


In short, as proposed by the OP, the device would be called Audio, so the device opener could be called Mix_OpenAudio, and all functions related to this root device would contain this word, e.g. Mix_QuitAudio, Mix_GetAudioDecoder, Mix_GetAudioSpec, etc.

It has been suggested that the sound target be called Source, which is misleading, because the source is sound itself (sample buffer loaded from a file, for example). I suggest using the correct name, i.e. Track, as the target for placing samples data. So instead of Mix_CreateSource, there should be Mix_CreateTrack, as well as others like Mix_DestroyTrack, Mix_HaltTrack, Mix_PauseTrack, etc. The data type to represent sound track could be named Mix_Track.

It was proposed to call the sound source Audio (the same as the subsystem), where we have the function Mix_LoadAudio to load sound from a file. It would be easier if a single sound or music (for which there is no distinction) was simply called Sound. The data type to represent single sound could be named Mix_Sound and the functions for creating and playing sounds could be named Mix_LoadSound, Mix_PlaySound, etc.


I think that such nomenclature will be as intuitive as possible and will make it easier to use the mixer functionality, especially for those who are not specialists in the field of sound mixing, but more or less understand what they consist of.

@icculus
Copy link
Collaborator Author

icculus commented Mar 13, 2025

Locally I had changed "Source" to "Input" but "Track" is pretty good. I'll think on it.

@a-hurst
Copy link

a-hurst commented Mar 21, 2025

One quick thought re: my specific use case, I notice in the new API there's no equivalent of Mix_QuickLoad_Raw anymore? I recognize why it might be desirable to remove it from the API (potential for misuse/bugs given the assumptions it makes), but I currently use for playing procedurally-generated sounds without having to encode the arrays of audio as WAV files or anything first.

Maybe an alternative, safer API for that use case might allow passing the raw array along with its bits, sample rate, and number of channels along with the length? Essentially, "here's an array of raw PCM audio, and here are its specs". There would be a slight performance cost of course, but it would make it more flexible to use while also reducing the likelihood of problems due to mismatches between the audio spec and the generated audio buffers. Let me know what you think!

@DEF7
Copy link

DEF7 commented Mar 21, 2025

procedurally-generated sounds

Just for the record, I second this. I assumed that we'd still have the ability to pass raw PCM data and it just wasn't included in the thread's tentative outline of the new API, but on the off chance that it was intentionally being omitted - it shan't be!

@raiguard
Copy link

I'm working on a personal project which requires me to have pretty accurate audio latency measurements. To my dismay, Mix_GetMusicPosition is not very accurate, which is leading to inaccurate latency measurements as well.

Will more precise reading of the stream position be possible in the redesign?

2025-03-20T23.53.59.716320602-06.00.mp4

@icculus
Copy link
Collaborator Author

icculus commented Mar 24, 2025

Okay, so this is not ready yet, but I've pushed what I've got to https://github.com/icculus/SDL_remixer for now.

Eventually, that repository go away and we'll drop the files from it into SDL_mixer's main, wire it up to the existing CMake, etc.

Right now it's not usable (or even compilable, since I changed a bunch of stuff right before pushing that). I'm about to rough in a very basic .WAV backend just for proof-of-concept purposes, since moving the existing decoders over will take a little more time and I'd like to have something that makes noise sooner than later.

@icculus
Copy link
Collaborator Author

icculus commented Mar 24, 2025

To catch up on the conversation here:

  • I went with @flowCRANE's suggestion of using the term "Track," and I love it. However, I did not go with separate buffers for each track channel, since every other part of the system expects interleaved audio, and most apps using SDL_mixer are going to be dealing with files and not the specific data within them anyhow. Other names haven't changed yet (Mix_Audio, Mix_OpenAudio()), but they likely will.
  • My solution to the reverb problem was to let apps specify an amount of silence to play after a track completes. Once we're out of data, the track will generate that much silence before progressing to STOPPED and firing a finished callback. This has two benefits: if you're doing a mix callback and want to generate reverb, you can go past the end of the file's audio to generate the last bit of echo, and if you want a delay between two tracks, you don't have to have extra logic to manage this; just add silence and start the next track in a finished callback, which will happen after the silence is complete.
  • Most of the API deals with sample frames instead of time, so you can get sample-perfect fades, set a maximum number of frames to play, seek to a specific frame, etc. For those that want to work in milliseconds, there's a conversion function. This gets dicey when doing things that deal with multiple tracks at once (Mix_PlayTag, etc)...these have to work in milliseconds, because different tracks could have different sample rates.
  • Mix_QuickLoad_RAW is going away, but we'll likely allow loading of raw data through a Property, and maybe also an option to avoid copying the data this way, too, but I haven't thought it through yet.
  • I probably need an addition to SDL to lock the mixer. We avoided this because locks are generally held on SDL_AudioStreams instead of the mixer as a whole, but the only way to start/stop all your streams at the same time is to bind them with a single SDL_BindAudioStreams() call, and I'm discovering that this is a giant pain in the butt. It would be much nicer to have SDL_LockAudioDevice() again, and just bind each device separately.
  • I'm trying to decide if it's nicer to not have a context handle that gets passed to every SDL_mixer call, or nicer to let an app have multiple mixers, possibly on different devices.
  • Variable playback speed was mentioned, and SDL_AudioStream supports this, so we could expose this to the app trivially, since each track eventually pushes data into a stream.

I have to read back over this whole thread, as I'm sure there's a ton of feedback I haven't addressed yet; I promise I'm not ignoring you!

@icculus
Copy link
Collaborator Author

icculus commented Mar 31, 2025

Mix_QuickLoad_RAW is going away, but we'll likely allow loading of raw data through a Property, and maybe also an option to avoid copying the data this way, too, but I haven't thought it through yet.

Latest in revision control has:

// Load raw PCM data to a Mix_Audio from an IOStream.
extern SDL_DECLSPEC Mix_Audio * SDLCALL Mix_LoadRawAudio_IO(SDL_IOStream *io, const SDL_AudioSpec *spec, bool closeio);

// Load raw PCM data to a Mix_Audio. If free_when_done==true, will be SDL_free()'d when the Mix_Audio is destroyed. Otherwise, it's never free'd by SDL_mixer.
extern SDL_DECLSPEC Mix_Audio * SDLCALL Mix_LoadRawAudio(const void *data, size_t datalen, const SDL_AudioSpec *spec, bool free_when_done);

@icculus
Copy link
Collaborator Author

icculus commented Apr 8, 2025

There's been a ton more work on this. We have a Wav, Ogg, MP3, and AIFF decoders hooked up, lots of metadata improvements, and, of course, a working mixer.

I've started using the issue tracker as a TODO list, to give you an idea of where we are:

https://github.com/icculus/SDL_remixer/issues

This is at the point where people can start experimenting with it, although there are a few genuinely janky pieces and there will be changes as feedback comes in.

@icculus
Copy link
Collaborator Author

icculus commented Apr 8, 2025

Also: do we want duplicate decoders? Is it worth having both a libvorbis decoder and an stb_vorbis decoder? Likewise for dr_mp3/mpg123, dr_flac/libFLAC, and Timidity/FluidSynth.

@DEF7
Copy link

DEF7 commented Apr 8, 2025

There's been a ton more work on this.

Yes!!! I'm excited. Thank you for all of the work you're doing. I started a project 6-7 months ago using SDL3 after using SDL2 since it was released, and SDL before that for several years. I was counting on SDL_mixer 3.0 coming around when the project was ready for audio to be added in - and if not then I would've written my own mixing ontop of SDL_audio as a last resort.

do we want duplicate decoders?

While I do agree that it seems somewhat redundant to have different libraries available for decoding, I do think that it might be best to allow developers to pick and choose which exact ones they want to employ in their wares. That's my two cents. Maybe just make sure the documentation explains that different audio types have different numbers of supported decoders available that they can choose from - so they don't feel like they're just staring at a big list of decoders they don't understand the purpose of. Classify them by the audio type(s) they support.

Image

Image

Cheers! :]

@madebr madebr mentioned this issue Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants