Skip to content

Commit

Permalink
Post about schema coders and benchmarks.
Browse files Browse the repository at this point in the history
  • Loading branch information
lostluck committed Jan 22, 2025
1 parent 6e9995d commit 2acb399
Show file tree
Hide file tree
Showing 2 changed files with 127 additions and 1 deletion.
127 changes: 127 additions & 0 deletions content/posts/2025-01-22-schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
title: "Schemas and Coders and Benchmarks"
date: 2025-01-22T06:24:24-08:00
tags:
- beam
- go
- hobby sdk
- dev
categories:
- Dev
---

This weekend I got nerd snipped into working on a part of my hobby SDK, that
I didn't have too much motivation to do. Beam Schema Row coders.

But they do need to be done eventually, so I implemented them.

In particular, what I needed was a way to take the Beam Schema Proto, and turn
it into coder that could produce a dynamic row value. The existing Go SDK in
principle could do it, but it's not easy due to a choice I made years and years
ago. I'm not convinced it's the wrong choice though. Besides, adding the new handling
to the API would take longer than adding it to the hobby SDK, or for what I needed
it for from scratch.

Perhaps that's not entirely true. I'm being paranoid about continuing to broaden
the API surface of the existing SDK. It's already quite large and complex, and
the interactions are already quite subtle.

Anyway, I wrote a quick naive implementation of the code, added tests, made them all
pass, and [wrote a benchmark](https://github.com/lostluck/beam-go/blob/9429632fa47a6752671c4a4d6ad0325742485599/internal/schema/schema_test.go#L143).

```go
func BenchmarkRoundtrip(b *testing.B) {
for _, test := range suite {
b.Run(test.name, func(b *testing.B) {
c := ToCoder(test.schema)
b.ReportAllocs()
b.ResetTimer()
for range b.N {
r := coders.Decode(c, test.data)
if got, want := coders.Encode(c, r), test.data; !cmp.Equal(got, want) {
b.Errorf("round trip decode-encode not equal: want %v, got %v", want, got)
}
}
})
}
}
```

It's a pretty straightforward thing. I took the implementation straight from the
above tests, set it to report allocations, and restart the benchmark timer, and
then get to iterating.

This gives bad results though. The problem is here:

```go
if got, want := coders.Encode(c, r), test.data; !cmp.Equal(got, want) {
b.Errorf("round trip decode-encode not equal: want %v, got %v", want, got)
}
```

I kept the comparison in, to validate that everything continues to work as
desired through the thousands of runs the coders would be put though. Or I was
lazy about it. `cmp` is not intended to be high performance, it's intended to be
convenient and correct.

Had I used `bytes.Equal` instead of the general `cmp.Equal`,
I'd have spent less CPU, and probably wouldn't have noticed.

For this task, I didn't much care about CPU. I cared about allocations, which do
directly affect CPU and memory usage. And `cmp` is allocation heavy by
comparison to a straight byte slice equality check.

That's not all though. I had used my convenience functions for the benchmark too,
to trivially get the through the decode and encode cycle.
`coders.Decode` and `coders.Encode` are just small wrappers to simplify quick one
off encodings, such as for testing.

But like `cmp`, they are convenient, not high performance.

The test body [now looks like this](https://github.com/lostluck/beam-go/blob/53c2c2b073dce1eebbd4090b2d642362f277c895/internal/schema/schema_test.go#L225):

```go
c := ToCoder(test.schema)
// Mild shenanigans to prevent unnecessary allocations.
enc := coders.NewEncoder()
dec := *coders.NewDecoder(test.data)
n := len(test.data)
want := test.data

b.ReportAllocs()
b.ResetTimer()
for range b.N {
enc.Reset(n)
dec = *coders.NewDecoder(test.data)
r := c.Decode(&dec)
c.Encode(enc, r)
if got := enc.Data(); !bytes.Equal(got, want) {
b.Errorf("encoding not equal: want %v, got %v", want, got)
}
}
```

Moving the `Encoder` and `Decoder` allocations out of the hot loop, removed them
from the profile graph. There's a bit of "fun" with the Decoder to avoid it
getting heap allocated anew for each loop when the test data is being reset.
Having the comparison back in adds a few nanoseconds per run, but helps keep the
code staying robust.

All of this was a problem largely because I wanted to collect a nice and clean
"before" and "after" versions of the metrics as I cleaned up and changed the
implementation. I like seeing performance improvements. But now the results are
incomparable, and not apples to apples, so I've cleared them away.

I am happy where this has ended up though. I incorporated the `unsafe` tricks
protocol buffers use to minimize allocations and space in their protoreflect
package: https://github.com/protocolbuffers/protobuf-go/blob/master/reflect/protoreflect/value_unsafe_go121.go.

The idea is the same, really. Be able to refer to and mutate values and fields
efficiently, against a known schema. I'd use their implementation directly if I
were able to, but they have it quite locked down for the same reasons why I've
put this stuff in an internal package for the time being. I want it to be able
to change.

This is basically half of a useful article, other than the lesson about
being certain you know what you're measuring in your benchmarks. We'll see if
I turn back the clock and collect clean measurements of the implementations.
1 change: 0 additions & 1 deletion themes/hugo-coder/assets/scss/_content.scss
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@
// hyphens: auto;

white-space: normal;
max-width: 60rem;
}
}

Expand Down

0 comments on commit 2acb399

Please sign in to comment.