TinyBits is a compact binary serialization format designed for efficient encoding and decoding of structured data. The format supports various data types including integers, floating-point numbers, strings, arrays, maps, blobs, and special values like null, boolean, and IEEE floating-point special values.
This specification describes TinyBits format as of April 2025.
- Compact representation of data
- Fast encoding and decoding
- Support for common data types
- String deduplication for memory efficiency
- Optimized floating-point encoding
TinyBits uses a tag-based encoding system where the first byte of each value contains a type tag that determines how to interpret the following bytes.
Type | Description |
---|---|
Integer | Signed 64-bit integers |
String | UTF-8 encoded strings |
Array | Ordered sequence of values |
Map | Collection of key-value pairs |
Double | IEEE 754 64-bit floating-point |
Compressed Float | Space-efficient floating-point representation |
Boolean | True or false values |
Null | Absence of a value |
Special Float | NaN, +Infinity, -Infinity |
Blob | Raw binary data |
The first byte of each encoded value indicates its type:
0x80-0xFF: Integer
0x40-0x5F: String (inline)
0x60-0x7F: String (reference)
0x20-0x2F: Positive floating-point
0x30-0x3F: Negative floating-point
0x2D: NaN
0x3D: Positive infinity
0x2E: Negative infinity
0x3E: Float16
0x2F: Float32
0x3F: Float64 (IEEE double)
0x10-0x1F: Map
0x08-0x0F: Array
0x04: Extension (reserved)
0x03: Blob
0x02: Null
0x01: True
0x00: False
Integers use the high bit (0x80) as a type identifier:
- For integers 0-119: Encoded as
0x80 | value
- For integers 120 and above: Encoded as
0xF8
followed by a varint encoding ofvalue - 120
- For integers -1 to -6: Encoded as
0xF9 + |value|
(249-254) - For integers below -6: Encoded as
0xFF
followed by a varint encoding of-(value + 7)
Strings are encoded with two different methods:
-
Inline String (0x40-0x5F):
- For strings 0-30 bytes:
(0x40 | length)
followed by the string data - For strings 31+ bytes:
0x5F
followed by a varint encoding oflength - 31
, then the string data
- For strings 0-30 bytes:
-
Reference String (0x60-0x7F):
- For referencing previously encoded strings with ID 0-30:
(0x60 | id)
- For referencing previously encoded strings with ID 31+:
0x7F
followed by a varint encoding ofid - 31
- For referencing previously encoded strings with ID 0-30:
Arrays are encoded with the 0x08 tag:
- For arrays with 0-6 elements:
(0x08 | length)
- For arrays with 7+ elements:
0x0F
followed by a varint encoding oflength - 7
Maps are encoded with the 0x10 tag:
- For maps with 0-14 key-value pairs:
(0x10 | length)
- For maps with 15+ key-value pairs:
0x1F
followed by a varint encoding oflength - 15
Two encoding methods are used:
-
Raw IEEE-754 double (0x3F):
- Encoded as
0x3F
followed by 8 bytes containing the IEEE 754 bit representation
- Encoded as
-
Compressed floating-point (0x20-0x2F for positive, 0x30-0x3F for negative):
- Format:
tag
followed by a varint - The lower 4 bits of the tag represent the number of decimal places
- The varint represents the integer value of the scaled number
- Example: 3.14 is represented as 314 with 2 decimal places
- Format:
- NaN: Encoded as
0x2D
- Positive Infinity: Encoded as
0x3D
- Negative Infinity: Encoded as
0x2E
- True: Encoded as
0x01
- False: Encoded as
0x00
- Null: Encoded as
0x02
Blobs are encoded as:
0x03
(Blob tag)- Varint encoding of the blob length
- Raw blob data
TinyBits uses a custom variable-length integer encoding based on the first byte value:
- For values 0-240: Encoded directly as a single byte
- For values 241-2287: Encoded as
241 + (value-241)/256
followed by(value-241)%256
- For values 2288-67823: Encoded as
249
followed by two bytes representing(value-2288)/256
and(value-2288)%256
- For larger values (up to 64-bit):
250
: 3-byte big-endian251
: 4-byte big-endian252
: 5-byte big-endian253
: 6-byte big-endian254
: 7-byte big-endian255
: 8-byte big-endian
The TinyBits encoder maintains a hash table to deduplicate string values:
- Strings between 2-128 bytes in length can be deduplicated
- First occurrence of a string is encoded inline
- Subsequent occurrences use reference encoding
- The hash table uses a 32-bit hash based on string length and content
Floating-point values can be compressed when they have a relatively small number of decimal places:
- Threshold is 12 decimal places or fewer
- Values are multiplied by the appropriate power of 10
- The resulting integer is encoded as a varint
- The tag byte indicates the number of decimal places and the sign
TinyBits supports optional features that can be enabled at encoder creation:
TB_FEATURE_STRING_DEDUPE
(0x01): Enable string deduplicationTB_FEATURE_COMPRESS_FLOATS
(0x02): Enable floating-point compression
- The encoder grows its buffer dynamically as needed
- String deduplication is limited to 256 unique strings
- The maximum string length for deduplication is 128 bytes
- The encoder can be reset to reuse memory
- All multi-byte integer values are stored in big-endian format
For complete implementation details, refer to the TinyBits source code, including:
packer.h
: Functions for encoding valuesunpacker.h
: Functions for decoding valuescommon.h
: Common utilities and constant definitions