zio-blocks/docs/reference/formats.md at main · zzjpython/zio-blocks

id	title	sidebar_label
formats	Serialization Formats	Formats

ZIO Blocks Schema provides automatic codec derivation for multiple serialization formats. Once you have a Schema[A] for your data type, you can derive codecs for any supported format using the unified Schema.derive(Format) pattern.

A Format is an abstraction that bundles together everything needed to serialize and deserialize data in a specific format (JSON, Avro, Protobuf, etc.).

Overview

Each format defines the types of input for decoding and output for encoding, as well as the typeclass used as a codec for that format. Each format contains a Deriver corresponding to its specific MIME type, which is used to derive codecs from schemas:

trait Format {
  type DecodeInput
  type EncodeOutput
  type TypeClass[A] <: Codec[DecodeInput, EncodeOutput, A]
  def mimeType: String
  def deriver: Deriver[TypeClass]
}

It unifies all metadata related to serialization formats, such as MIME type and codec deriver, in a single place. This allows for a consistent API across different formats when deriving codecs from schemas. Having MIME type information helps with runtime content negotiation and format routing, for example in HTTP servers or message queues.

That is, you can easily call Schema[A].derive(format) for any format that implements the Format trait, and receive a codec that can encode and decode values of type A according to the rules of that format.

Formats are categorized into BinaryFormat and TextFormat, which specify the types of input and output for encoding and decoding:

sealed trait Format
abstract class BinaryFormat[...](...) extends Format { ... }
abstract class TextFormat[...](...) extends Format { ... }

For example, the JsonFormat is a BinaryFormat that represents a JSON binary format, where the input for decoding is ByteBuffer and the output for encoding is also ByteBuffer, the MIME type is application/json, and the deriver for generating codecs from schemas is JsonBinaryCodecDeriver:

object JsonFormat extends BinaryFormat("application/json", JsonBinaryCodecDeriver)

Built-in Formats

Here's a summary of the formats currently supported by ZIO Blocks. Each format provides a BinaryFormat object that can be passed to derive:

Format Object	Codec Type	MIME Type	Module
`JsonFormat`	`JsonBinaryCodec[A]`	`application/json`	`zio-blocks-schema`
`ToonFormat`	`ToonBinaryCodec[A]`	`text/toon`	`zio-blocks-schema-toon`
`MessagePackFormat`	`MessagePackBinaryCodec[A]`	`application/msgpack`	`zio-blocks-schema-messagepack`
`AvroFormat`	`AvroBinaryCodec[A]`	`application/avro`	`zio-blocks-schema-avro`
`ThriftFormat`	`ThriftBinaryCodec[A]`	`application/thrift`	`zio-blocks-schema-thrift`

Defining a Custom Format

To add a new serialization format, define a BinaryFormat (or TextFormat) singleton with a custom Deriver:

import zio.blocks.schema.codec.{BinaryCodec, BinaryFormat}
import zio.blocks.schema.derive.Deriver

// 1. Define your codec base class
abstract class MyCodec[A] extends BinaryCodec[A]

// 2. Implement a Deriver[MyCodec] (see Type-class Derivation docs)
// val myDeriver: Deriver[MyCodec] = ...

// 3. Create the format singleton
// object MyFormat extends BinaryFormat[MyCodec]("application/x-myformat", myDeriver)

For details on implementing a Deriver, see Type-class Derivation.

Codec Derivation System

All serialization formats in ZIO Blocks follow the same pattern: given a Schema[A], you derive a codec by calling derive with a format object:

import zio.blocks.schema._
import zio.blocks.schema.toon._

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Derive codec for any format (using TOON as an example)
val codec = Schema[Person].derive(ToonFormat)

// Encode to bytes
val bytes: Array[Byte] = codec.encode(Person("Alice", 30))

// Decode from bytes
val result: Either[SchemaError, Person] = codec.decode(bytes)

JSON Format

JSON format is the most commonly used text-based serialization format. See the dedicated JSON documentation for comprehensive coverage of the Json ADT, navigation, and transformation features.

Installation

JSON support is included in the core schema module:

libraryDependencies += "dev.zio" %% "zio-blocks-schema" % "<version>"

Basic Usage

import zio.blocks.schema._
import zio.blocks.schema.json._

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Using JsonEncoder/JsonDecoder
val jsonEncoder = JsonEncoder[Person]
val jsonDecoder = JsonDecoder[Person]

val person = Person("Alice", 30)
val json: Json = jsonEncoder.encode(person)
// {"name":"Alice","age":30}

val decoded: Either[SchemaError, Person] = jsonDecoder.decode(json)

Avro Format

Apache Avro is a compact binary format with schema evolution support, commonly used in big data systems like Kafka and Spark.

Installation

libraryDependencies += "dev.zio" %% "zio-blocks-schema-avro" % "<version>"

Requires the Apache Avro library (1.12.x).

Basic Usage

import zio.blocks.schema._
import zio.blocks.schema.avro._

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Derive Avro codec
val codec = Schema[Person].derive(AvroFormat)

// Encode to Avro binary format
val person = Person("Alice", 30)
val bytes: Array[Byte] = codec.encode(person)

// Decode from Avro binary format
val decoded: Either[SchemaError, Person] = codec.decode(bytes)

Avro Schema Generation

Each AvroBinaryCodec exposes an avroSchema property containing the Apache Avro schema:

import zio.blocks.schema._
import zio.blocks.schema.avro._
import org.apache.avro.{Schema => AvroSchema}

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

val codec = Schema[Person].derive(AvroFormat)
val avroSchema: AvroSchema = codec.avroSchema
println(avroSchema.toString(true))
// {
//   "type": "record",
//   "name": "Person",
//   "fields": [
//     {"name": "name", "type": "string"},
//     {"name": "age", "type": "int"}
//   ]
// }

Avro Type Mappings

Scala Type	Avro Type
`Boolean`	`boolean`
`Byte`, `Short`, `Int`	`int`
`Long`	`long`
`Float`	`float`
`Double`	`double`
`String`, `Char`	`string`
`BigInt`	`bytes`
`BigDecimal`	Record (mantissa, scale, precision, roundingMode)
`UUID`	16-byte fixed
`Currency`	3-byte fixed
`java.time.*`	Records or primitives
Case classes	`record`
Sealed traits	`union`
`List[A]`, `Set[A]`	`array`
`Map[String, V]`	`map`

ADT Encoding

Sealed traits are encoded as Avro unions with an integer index prefix:

import zio.blocks.schema._
import zio.blocks.schema.avro._

sealed trait Shape
case class Circle(radius: Double) extends Shape
case class Rectangle(width: Double, height: Double) extends Shape

object Shape {
  implicit val schema: Schema[Shape] = Schema.derived
}

val codec = Schema[Shape].derive(AvroFormat)

// The variant index (0 for Circle, 1 for Rectangle) is written first,
// followed by the record data
val circle: Shape = Circle(5.0)
val bytes = codec.encode(circle)

TOON Format (LLM-Optimized)

TOON (Token-Oriented Object Notation) is a line-oriented, indentation-based text format that encodes the JSON data model with explicit structure and minimal quoting. It is 30-60% more compact than JSON, making it particularly efficient for LLM prompts and responses.

Why TOON?

Token efficient: 30-60% fewer tokens than equivalent JSON
Human readable: Clean, YAML-like syntax without YAML's complexity
LLM optimized: Designed for AI/ML use cases where token count matters
Explicit lengths: Arrays declare their size upfront for reliable parsing
Cross-platform: Works on JVM and Scala.js

Installation

libraryDependencies += "dev.zio" %% "zio-blocks-schema-toon" % "<version>"

Basic Usage

import zio.blocks.schema._
import zio.blocks.schema.toon._

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Derive TOON codec
val codec = Schema[Person].derive(ToonFormat)

// Encode to TOON
val person = Person("Alice", 30)
val bytes: Array[Byte] = codec.encode(person)
// name: Alice
// age: 30

// Decode from TOON
val decoded: Either[SchemaError, Person] = codec.decode(bytes)

TOON Format Examples

TOON uses indentation and explicit array lengths:

# Simple object
name: Alice
age: 30
email: [email protected]

# Inline primitive arrays (comma-separated)
tags[3]: scala,zio,functional

# Nested object
address:
  street: 123 Main St
  city: Springfield

# Object arrays use list format
orders[2]:
  - id: 1
    total: 99.99
  - id: 2
    total: 149.5

# Or tabular format (more compact)
orders[2]{id,total}:
  1,99.99
  2,149.5

Configuration Options

The ToonBinaryCodecDeriver provides extensive configuration:

import zio.blocks.schema._
import zio.blocks.schema.toon._

case class Person(firstName: String, lastName: String)
object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Custom deriver with snake_case field names
val customDeriver = ToonBinaryCodecDeriver
  .withFieldNameMapper(NameMapper.SnakeCase)
  .withArrayFormat(ArrayFormat.Tabular)
  .withDiscriminatorKind(DiscriminatorKind.Field("type"))

val codec = Schema[Person].derive(customDeriver)
// first_name: Alice
// last_name: Smith

Option	Description	Default
`withFieldNameMapper`	Transform field names (Identity, SnakeCase, KebabCase)	`Identity`
`withCaseNameMapper`	Transform variant/case names	`Identity`
`withDiscriminatorKind`	ADT discriminator style (Key, Field, None)	`Key`
`withArrayFormat`	Array encoding (Auto, Tabular, Inline, List)	`Auto`
`withDelimiter`	Inline array delimiter (Comma, Tab, Pipe)	`Comma`
`withRejectExtraFields`	Error on unknown fields during decoding	`false`
`withEnumValuesAsStrings`	Encode enum values as strings	`true`
`withTransientNone`	Omit None values from output	`true`
`withTransientEmptyCollection`	Omit empty collections	`true`
`withTransientDefaultValue`	Omit fields with default values	`true`

ADT Encoding Styles

import zio.blocks.schema._
import zio.blocks.schema.toon._

sealed trait Shape
case class Circle(radius: Double) extends Shape

object Shape {
  implicit val schema: Schema[Shape] = Schema.derived
}

// Key discriminator (default)
val keyCodec = Schema[Shape].derive(ToonFormat)
// Circle:
//   radius: 5

// Field discriminator
val fieldDeriver = ToonBinaryCodecDeriver
  .withDiscriminatorKind(DiscriminatorKind.Field("type"))
val fieldCodec = Schema[Shape].derive(fieldDeriver)
// type: Circle
// radius: 5

MessagePack Format

MessagePack is an efficient binary serialization format that is more compact than JSON while remaining schema-less and cross-language compatible.

Installation

libraryDependencies += "dev.zio" %% "zio-blocks-schema-messagepack" % "<version>"

Basic Usage

import zio.blocks.schema._
import zio.blocks.schema.msgpack._

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Derive MessagePack codec
val codec = Schema[Person].derive(MessagePackFormat)

// Encode to MessagePack
val person = Person("Alice", 30)
val bytes: Array[Byte] = codec.encode(person)

// Decode from MessagePack
val decoded: Either[SchemaError, Person] = codec.decode(bytes)

Binary Efficiency

MessagePack provides significant space savings compared to JSON:

Typically 50-80% of JSON size
Uses variable-width integer encoding
No string escaping overhead
No key quoting or colons/commas

MessagePack Type Mappings

Scala Type	MessagePack Type
`Unit`	nil
`Boolean`	bool
`Byte`, `Short`, `Int`, `Long`	int (variable width)
`Float`	float32
`Double`	float64
`String`, `Char`	str
`Array[Byte]`	bin
`List[A]`, `Vector[A]`, `Set[A]`	array
`Map[K, V]`	map
`Option[A]`	array (0 or 1 element)
`Either[A, B]`	map with "left" or "right" key
Case classes	map with field names as keys
Sealed traits	int index followed by value

ADT Encoding

Sealed traits encode a variant index followed by the case value:

import zio.blocks.schema._
import zio.blocks.schema.msgpack._

sealed trait Shape
case class Circle(radius: Double) extends Shape
case class Rectangle(width: Double, height: Double) extends Shape

object Shape {
  implicit val schema: Schema[Shape] = Schema.derived
}

val codec = Schema[Shape].derive(MessagePackFormat)

// Circle is encoded as: 0 followed by {radius: 5.0}
val circle: Shape = Circle(5.0)
val bytes = codec.encode(circle)

BSON Format

BSON (Binary JSON) is the binary format used by MongoDB. The ZIO Blocks BSON module provides integration with the MongoDB BSON library.

Installation

libraryDependencies += "dev.zio" %% "zio-blocks-schema-bson" % "<version>"

Requires the MongoDB BSON library (5.x).

Basic Usage

import zio.blocks.schema._
import zio.blocks.schema.bson._

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Derive BSON encoder/decoder
val encoder: BsonEncoder[Person] = BsonSchemaCodec.bsonEncoder(Schema[Person])
val decoder: BsonDecoder[Person] = BsonSchemaCodec.bsonDecoder(Schema[Person])

// Or get both as a codec
val codec: BsonCodec[Person] = BsonSchemaCodec.bsonCodec(Schema[Person])

MongoDB ObjectId Support

BSON provides native support for MongoDB ObjectIds:

import zio.blocks.schema._
import zio.blocks.schema.bson._
import org.bson.types.ObjectId

// Import ObjectId schema
import ObjectIdSupport.objectIdSchema

case class Document(_id: ObjectId, title: String)

object Document {
  implicit val schema: Schema[Document] = Schema.derived
}

// ObjectId is encoded using BSON's native OBJECT_ID type
val codec = BsonSchemaCodec.bsonCodec(Schema[Document])

Configuration Options

import zio.blocks.schema._
import zio.blocks.schema.bson._
import BsonSchemaCodec._

case class Person(name: String, age: Int)
object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Custom configuration
val config = Config
  .withSumTypeHandling(SumTypeHandling.DiscriminatorField("_type"))
  .withIgnoreExtraFields(true)
  .withNativeObjectId(true)

val codec = BsonSchemaCodec.bsonCodec(Schema[Person], config)

Option	Description	Default
`withSumTypeHandling`	ADT discrimination strategy	`WrapperWithClassNameField`
`withClassNameMapping`	Transform class names	`identity`
`withIgnoreExtraFields`	Ignore unknown fields on decode	`true`
`withNativeObjectId`	Use native BSON ObjectId type	`false`

Sum Type Handling

import zio.blocks.schema.bson.BsonSchemaCodec.SumTypeHandling

// Option 1: Wrapper with class name as field key (default)
SumTypeHandling.WrapperWithClassNameField
// {"Circle": {"radius": 5.0}}

// Option 2: Discriminator field
SumTypeHandling.DiscriminatorField("_type")
// {"_type": "Circle", "radius": 5.0}

// Option 3: No discriminator (tries each case)
SumTypeHandling.NoDiscriminator

Thrift Format

Apache Thrift is a binary protocol format with field ID-based encoding, supporting forward-compatible schema evolution.

Installation

libraryDependencies += "dev.zio" %% "zio-blocks-schema-thrift" % "<version>"

Requires the Apache Thrift library (0.22.x).

Basic Usage

import zio.blocks.schema._
import zio.blocks.schema.thrift._
import java.nio.ByteBuffer

case class Person(name: String, age: Int)

object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

// Derive Thrift codec
val codec = Schema[Person].derive(ThriftFormat)

// Encode to Thrift binary format
val person = Person("Alice", 30)
val bytes: Array[Byte] = codec.encode(person)

// Decode from Thrift binary format
val decoded: Either[SchemaError, Person] = codec.decode(bytes)

// ByteBuffer API
val buffer = ByteBuffer.allocate(1024)
codec.encode(person, buffer)
buffer.flip()
val fromBuffer: Either[SchemaError, Person] = codec.decode(buffer)

Thrift-Specific Features

Field ID-based encoding: Uses 1-based field IDs corresponding to case class field positions
Forward compatibility: Unknown fields are skipped during decoding
Out-of-order decoding: Fields can arrive in any order on the wire
TBinaryProtocol: Uses the standard Thrift binary protocol

Thrift Type Mappings

Scala Type	Thrift Type
`Unit`	VOID
`Boolean`	BOOL
`Byte`	BYTE
`Short`, `Char`	I16
`Int`	I32
`Long`	I64
`Float`, `Double`	DOUBLE
`String`	STRING
`BigInt`	Binary (STRING)
`BigDecimal`	STRUCT
`java.time.*`	STRING (ISO format) or I32
`List[A]`	LIST
`Map[K, V]`	MAP
Case classes	STRUCT
Sealed traits	Indexed variant

Supported Types

All formats support the full set of ZIO Blocks Schema primitive types:

Numeric Types:

Boolean, Byte, Short, Int, Long, Float, Double, Char
BigInt, BigDecimal

Text Types:

String

Special Types:

Unit, UUID, Currency

Java Time Types:

Instant, LocalDate, LocalTime, LocalDateTime
OffsetTime, OffsetDateTime, ZonedDateTime
Duration, Period
Year, YearMonth, MonthDay
DayOfWeek, Month
ZoneId, ZoneOffset

Composite Types:

Records (case classes)
Variants (sealed traits)
Sequences (List, Vector, Set, Array, etc.)
Maps (Map[K, V])
Options (Option[A])
Eithers (Either[A, B])
Wrappers (newtypes)

Cross-Platform Support

Format	JVM	Scala.js
JSON	✓	✓
TOON	✓	✓
MessagePack	✓	✓
Avro	✓	✗
Thrift	✓	✗
BSON	✓	✗

Error Handling

All formats return Either[SchemaError, A] for decoding operations. Errors include path information for debugging:

import zio.blocks.schema._
import zio.blocks.schema.toon._

case class Person(name: String, age: Int)
object Person {
  implicit val schema: Schema[Person] = Schema.derived
}

val codec = Schema[Person].derive(ToonFormat)

// Example: decoding invalid bytes
val invalidBytes = "invalid: data\nwrong: format".getBytes
val result = codec.decode(invalidBytes)

result match {
  case Right(person) => println(s"Decoded: $person")
  case Left(error) => 
    // SchemaError includes information about the decode failure
    error.errors.foreach(e => println(s"Error: ${e.message}"))
}

FilesExpand file tree

formats.md

Latest commit

History

formats.md

File metadata and controls

Overview

Built-in Formats

Defining a Custom Format

Codec Derivation System

JSON Format

Installation

Basic Usage

Avro Format

Installation

Basic Usage

Avro Schema Generation

Avro Type Mappings

ADT Encoding

TOON Format (LLM-Optimized)

Why TOON?

Installation

Basic Usage

TOON Format Examples

Configuration Options

ADT Encoding Styles

MessagePack Format

Installation

Basic Usage

Binary Efficiency

MessagePack Type Mappings

ADT Encoding

BSON Format

Installation

Basic Usage

MongoDB ObjectId Support

Configuration Options

Sum Type Handling

Thrift Format

Installation

Basic Usage

Thrift-Specific Features

Thrift Type Mappings

Supported Types

Cross-Platform Support

Error Handling