Suggestions and ideas for sound formats

This is not an issue pertaining to a specific part of Buzzer Studio, but rather to the general idea of storing music, speech and sound data in different ways to have it played back by the microcontroller at hand, always having the microcontroller's limits in mind, limiting the processing speed and the RAM you have.

Right now, there seem to be 4 different formats supported by Buzzer Studio with code for the microcontroller supporting playback of those formats:
1. LPC speech
2. ADPCM (good for speech and other sounds)
3. Sound effects (given by the duration of each on/off cycle of the audio in microseconds)
4. Monophonic or polyphonic music (getting divided into monophonic streams of squarewave notes, all at the same volume)

I've looked at the data format of 4., and I think it could be improved somewhat by formatting the data differently or using tables.

Over the years, I've analyzed different ways and formats of storing and playing back music, speech and other sounds, especially on memory-, bandwidth- and/or speed-limited devices, and I've also written a few converter programs in this field, which for the most part remained unreleased.

Let's start with the earliest one. I had the Bontempi Memoplay keyboard which probably relied on a microcontroller as well, banging a bit for the audio. It could also store the music you played in a limited way since it only had 32 bytes of RAM (probably), so 28 of those were used for storing the music, with each note taking up a single byte, probably divided in 5 bits for the pitch and 3 bits for length... the length was 1/8 to 8/8 of a second, and if you held a note for more than one second, another note of the same pitch got added. Since the keyboard only had 2 octaves, there was only 25 possible notes + 1 "rest" key, so 5 bits were enough for the value.

Then came the Casio keyboards, some of which also had a barcode reader, and the format of the barcodes was analyzed and broken down here:
https://rnhart.net/articles/casio-barcode.htm
Basically, here you have up to three separate streams for note lengths, pitches and chords. The chords are given by a single byte and last for a fixed time, relative to the tempo of the song. The notes and lengths are also given as separate bytes, with the length being given in twelfth of a beat, so a length of 12 is one beat (relative to the tempo of the music). There are also special commands hidden in there.

Then Yamaha released its Playcard keyboard, reading the music data from a magnetic strip at the bottom of each card. I think each card could hold up to 128 notes, but some were double-sided, or at least it was supported that you could read in both sides of the card in succession. I couldn't find any write-up on the format of this data, but I disassembled and analyzed the MSX catridge they also released together with a playcard reader to come up with at least part of the format. They basically took what Casio did one step further by compressing the data. That is, there are multiple fixed tables in ROM giving, for instance, 32 possible note lengths, and the value stored on the playcard is a 5-bit value indexing the table. Not only that, it also contains special commands... one means the last value should be repeated, and there are some denoting a repetition and the start of the part which should be repeated. But that's not all, they also compressed those values so that not each one takes up 5 bits, but the most common one has the lowest table index and takes up less bits while less common ones take up more bits. I think they did a similar compression for the note pitches.

While the playcards were basically ROM's you couldn't record yourself, some other keyboards did let you record your own music, like the Yamaha PSR-50 I have, which even lets you store the data on tape. Here you have a less fancy storage format, but since there was no write-up on that as well, I also analyzed this one. The storage on tape is more a less a dump of the data stored in RAM while you play and record your playing, only that the empty memory sections get skipped. Here you have three tracks (streams or parts), orchestra, solo and accompaniment. Since this keyboard also supports MIDI, the format is also somewhat tied to this by using the MIDI clock, which is 24 pulses per quarter note. A counter counts those pulses and goes up to 191, then returns to 0 (after two full notes). In the streams, one byte is set for the note you play (or, I think, 0 for a rest), and one more byte stores the position on the clock counter at which the event occurs. Each time the clock rolls over, an additional byte with value 255 gets stored. The byte for the note you play also can hold special events represented by values outside the normal MIDI note range, denoting an instrument change or a different settings change... each stream was obviously monophonic, although you could play the orchestra part of the keyboard polyphonically, but only storing monophonic data was supported. But you could turn on the "duet" or "trio" modes which would automatically add additional notes below the currently played ones fitting to the chord just played in the accompaniment section.

And this then evolved to MIDI which you are probably familiar with. MIDI does support polyphonic playing in one stream (here called track), velocity (playing the sounds in different volumes) and several tracks running at the same time.

Then there are formats used mostly in early computers. The Video Brain computer, as one of the first ones, had a 2-bit DAC, and some of its cartridges had a title music with a player for it. In this case, the music data was given as the byte to be sent to the converter in "on" status (which affected the volume of the note), the pitch (given in number of scanlines, 64 µS each, each pulse should take), and the length (given in number of video frames a note should play for or a rest should take), so 3 bytes per note up to a maximum of 85 notes due to a limitation in the code. The system somewhat supported this by having a scanline interrupt interrupting the CPU at a previously set scanline.

The TI-99/4A had a 4-voice sound generator subsequently also used in many other systems, with 3 polyphonic voices and one noise voice. It also had a player for "sound lists" built into ROM which would run interrupt-driven if you turned it on and pointed it to the location of the sound data. Consequently, this was used by most of its software to generate sound. The sound chip was basically driven by sending single- or double-byte commands to it over a single 8-bit port. There were 2-byte commands for setting the pitch of a channel, and 1-byte commands for setting its volume (and, in the case of the noise channel, for setting its period and noise type). The sound lists, then, were divided in events consisting of a number of bytes sent to the chip, so the data would first have a byte telling the number of bytes in this frame, then the bytes to send, and then the number of video frames (= number of interrupts) that should pass until the next event occurs. Basically only one sound list could play at a time, so 3-part music was done by stringing together all the commands for all voices in the order they should be given.

Then there were several player routines, editors and data formats for the Commodore 64 and its 3-voice SID chip. It probably started out with simpler ones that were just tables of notes and durations, all played with the same settings, but later musicians like Rob Hubbard and Martin Galway raised the bar considerably with their music for games like "Warhawk", "Highlander", "Crazy Comets" and other games, and Chris Huelsbeck made a similar music routine and an editor for its music data, the "Sound Monitor", which was released as a type-in program by German 64'er magazine in late 1986. Here the format basically relied on 3 tracks (similar to later trackers for other machine) which were subdivided in patterns, and each pattern was the same size (usually 64 bytes) having a list of notes intermingled with special commands and modifiers, for instance, you could tell a channel to quickly switch between 4 different notes to give the impression of a chord, also you could do vibrato, portamento or pulse EG (variating the pulse width of a square wave note), all within that same format, although some special events you could include in the note data indexed into a separate table you could fill, for instance a chord table giving the offsets of the additional notes that would make up a chord. The notes themselves were not given as the raw values pushed into the SID, but as a musical note value, like in MIDI, which would index a table with the actual SID values to push. This was also the software I would use for many songs I did, though I also analyzed one other music routine, the one used by David Dunn in "Chiller" which basically plays a variation on "Thriller" by Michael Jackson and uses a more event-driven system and even self-modifying code jumping to a location given by the command byte which then acts on subsequent bytes in the music data. But that player didn't support as many things as Chris Huelsbeck's Sound Monitor.

Then, of course, came other trackers for other systems like the Amiga, where I basically used Soundtracker for several pieces after having started out with Sonix, and after that I moved to Music-X for doing music via MIDI. Soundtracker is similar to Sound Monitor, but the screen organization is different, and some of the options are arranged differently, for instance, the chord data is now given directly in the pattern data rather then a separate table, but there now is a table of samples that will be used. One neat trick about that was that you could record chords as a sample and then play them back as a single note which would sound better than achieving the chord by quickly alternating between different notes.

One pretty advanced method of sound generation at that time was employed by Namco in their arcade games like Galaxian and Pac-Man. Instead of simple square waves, they used a fixed table of several waveforms which, I think, were given as a series of 32 4-bit samples each, and you could switch the sound generator between different waveforms you wanted to play.

Then there were several pretty neat conversions or at least conversion attempts. One of those tries to convert audio data into data for the TI-99 sound chip mentioned with 3 square waves by quickly changing the values, which basically gives a stream of 3-voice square waves with volumes. This was, for instance, used in a homebrew version of "Uridium" for the Colecovision, as you can see here: https://www.youtube.com/watch?v=iK7TfeGIhJ4
I also made an attempt on such a converter which will convert sound data into data for a 3-voice polyphonic square wave sound generator, where the settings will change with each video frame (that is 50-60 times per second).

As a variant of this, I tried to make a converter which will convert sound data into data for the Atari 2600's TIA sound chip, which only has 2 sound channels, but more different available waveforms, so the converter tries to find the most suitable waveforms for each frame... again, the values get changed on each video frame, and this time I also implemented playback of the data on the Atari 7800 which has the same sound hardware.

There are also converters which will try to convert arbitrary sound data into MIDI files, playing only piano (or a different instrument you can set, but not varying between different ones). I used this for converting a recording of my parents singing and playing the guitar, and when I played the resulting MIDI file to her on my portable keyboard, she could at least recognize it was a recording of her singing.

Myself, I also did a converter that attempts to convert MIDI files into sound lists for the TI-99, but the project, although it does work, was eventually abandoned without reaching its full planned potential. The plan was to optimize the conversion by only converting over the strongest notes, discarding notes outside of the playable range on the TI-99, and discarding the drum track, but I never got it working satisfactorily. There was one MIDI file which I specifically created for such a conversion in order to have it play on the TI-99 (as music for a conversion of the game "2048"), which is a 3-part version of the song "Inchworm" originally done by Danny Kaye. I've attached this MIDI file to this post since it also works well with Buzzer Studio's MIDI converter.

[inchworm.zip](https://github.com/user-attachments/files/23437352/inchworm.zip)

My role model for those conversions was the "Frodigi" demos for the Commodore 64 where they try to represent music by running free-running oscillators on the SID (in this case, all set to a triangle wave), which you can hear here: https://www.youtube.com/watch?v=rpH3wLKPHCU
Here the data is changed once per video frame as well (I assume), but further compressed by using different tables with entries for the most common cases, so that the necessary data rate drops to 6-8 bytes per frame (the SID normally takes up to 24 bytes for its settings).

Worth mentioning is also the way some arcade games (like Baby Pac-Man, Q*bert and Joust) and pinball machines were producing sound: They had a dedicated sound CPU driving a DAC, but that CPU had to write each value by hand. Still some programmers even achieved polyphonic music on those devices (like the level intro tunes in Q*bert), but it sounded a bit scratchy since they had to resort to a lower sampling rate due to the calculations involved.

Finally, current Yamaha keyboards actually play music not by mixing the output of 32 (or, on newer models, 48) channels together, but by alternating samples of those channels very quickly so that the impression is that they are all playing together.

One idea I had, but which I never put through, was that with today's technology you can now separate vocals from songs, so the vocal part could then be reproduced by encoding it in LPC while the remaining music could be converted to a combination of squarewaves to accompany the singer. It should be possible to write a program that does this on the TI-99, but I haven't ever seen someone attempting it.

So, what can we learn from this for this microcontroller?

Our microcontroller, in capabilities, is roughly comparable to the 8-bit systems described... it's more advanced in some ways, but lacking in others. It doesn't have the fancy SID chip of the C-64 and not even the simpler sound chip of the TI-99, but only a single pin you have to "bang" on and off by hand. But it's fast enough to "oversample" things like the Yamaha keyboard, which means you can use PWM without the listener noticing that the pin is only ever put to 1 or 0. The 4 types of audio initially mentioned are some variations of the techniques used to get music out of this hardware. But there are more techniques I could think of, or ways to compress the data...

First, the current MIDI converter puts out data in a format that's not very well compressed. In comparison to that, Yamaha's and Casio's formats, as described, are more compact. Maybe a table could be used to map given, abbreviated pitch and length values to the actual values needed, but I don't know if accessing those tables would exceed the available computing budget.
It could also be attempted to not play each note at the same volume, but enabling different volume values for each note. However, this will probably require more processing time, thus maybe the available polyphony would have to be reduced. There could also be kind of an ADSR curve which makes each note follow a pre-determined pattern of attack, decay, sustain and release, like you can define it on the SID chip (and also the Casio VL-1 keyboard which I also had).
It could also be attempted to use short samples of different waveforms like in the Namco chip and reading those from a table, but this would likely lead to more processing time and a reduction in the available polyphony as well.
This sample idea could also be expanded by using a table of samples and having each channel (if it's computationally feasible to have multiple ones) play different samples from that table in succession. However, I don't know what kind of audio data could be represented well by that kind of approach.
The music player could also be expanded by supporting chords by alternating between different notes like on the C-64, but the music data should only give this chord once at its start without having to account for each single alternation of notes. To that end, the MIDI converter could then be amended so that if a chord is played in once channel, it gets converted to a chord using alternating notes for the microcontroller.
Finally, it could be attempted to support LPC playback along with some additional musical notes generated by a different technique like squarewaves or samples.

In the end, it comes down to what's possible with the given 16K of RAM and the given processing speed, as well as the necessity to bang the port per hand.

These are only suggestions and improvements I can think of. Maybe this is inapprioriate for an issue here, since it's a write-up rather than an issue, but I couldn't think of any better place to put it. If anyone can think of a better place to put it, I'll gladly move it there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions and ideas for sound formats #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggestions and ideas for sound formats #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions