Post about schema coders and benchmarks.

lostluck · Jan 22, 2025 · 2acb399 · 2acb399
1 parent 6e9995d
commit 2acb399
Show file tree

Hide file tree

Showing 2 changed files with 127 additions and 1 deletion.
diff --git a/content/posts/2025-01-22-schema.md b/content/posts/2025-01-22-schema.md
@@ -0,0 +1,127 @@
+---
+title: "Schemas and Coders and Benchmarks"
+date: 2025-01-22T06:24:24-08:00
+tags:
+- beam
+- go
+- hobby sdk
+- dev
+categories:
+- Dev
+---
+
+This weekend I got nerd snipped into working on a part of my hobby SDK, that
+I didn't have too much motivation to do. Beam Schema Row coders.
+
+But they do need to be done eventually, so I implemented them.
+
+In particular, what I needed was a way to take the Beam Schema Proto, and turn
+it into coder that could produce a dynamic row value. The existing Go SDK in
+principle could do it, but it's not easy due to a choice I made years and years
+ago. I'm not convinced it's the wrong choice though. Besides, adding the new handling
+to the API would take longer than adding it to the hobby SDK, or for what I needed
+it for from scratch.
+
+Perhaps that's not entirely true. I'm being paranoid about continuing to broaden
+the API surface of the existing SDK. It's already quite large and complex, and
+the interactions are already quite subtle.
+
+Anyway, I wrote a quick naive implementation of the code, added tests, made them all
+pass, and [wrote a benchmark](https://github.com/lostluck/beam-go/blob/9429632fa47a6752671c4a4d6ad0325742485599/internal/schema/schema_test.go#L143).
+
+```go
+func BenchmarkRoundtrip(b *testing.B) {
+	for _, test := range suite {
+		b.Run(test.name, func(b *testing.B) {
+			c := ToCoder(test.schema)
+			b.ReportAllocs()
+			b.ResetTimer()
+			for range b.N {
+				r := coders.Decode(c, test.data)
+				if got, want := coders.Encode(c, r), test.data; !cmp.Equal(got, want) {
+					b.Errorf("round trip decode-encode not equal: want %v, got %v", want, got)
+				}
+			}
+		})
+	}
+}
+```
+
+It's a pretty straightforward thing. I took the implementation straight from the
+above tests, set it to report allocations, and restart the benchmark timer, and
+then get to iterating.
+
+This gives bad results though. The problem is here:
+
+```go
+if got, want := coders.Encode(c, r), test.data; !cmp.Equal(got, want) {
+    b.Errorf("round trip decode-encode not equal: want %v, got %v", want, got)
+}
+```
+
+I kept the comparison in, to validate that everything continues to work as
+desired through the thousands of runs the coders would be put though. Or I was
+lazy about it. `cmp` is not intended to be high performance, it's intended to be
+convenient and correct.
+
+Had I used `bytes.Equal` instead of the general `cmp.Equal`,
+I'd have spent less CPU, and probably wouldn't have noticed.
+
+For this task, I didn't much care about CPU. I cared about allocations, which do
+directly affect CPU and memory usage. And `cmp` is allocation heavy by
+comparison to a straight byte slice equality check.
+
+That's not all though. I had used my convenience functions for the benchmark too,
+to trivially get the through the decode and encode cycle. 
+`coders.Decode` and `coders.Encode` are just small wrappers to simplify quick one
+off encodings, such as for testing. 
+
+But like `cmp`, they are convenient, not high performance.
+
+The test body [now looks like this](https://github.com/lostluck/beam-go/blob/53c2c2b073dce1eebbd4090b2d642362f277c895/internal/schema/schema_test.go#L225): 
+
+```go
+c := ToCoder(test.schema)
+// Mild shenanigans to prevent unnecessary allocations.
+enc := coders.NewEncoder()
+dec := *coders.NewDecoder(test.data)
+n := len(test.data)
+want := test.data
+
+b.ReportAllocs()
+b.ResetTimer()
+for range b.N {
+    enc.Reset(n)
+    dec = *coders.NewDecoder(test.data)
+    r := c.Decode(&dec)
+    c.Encode(enc, r)
+    if got := enc.Data(); !bytes.Equal(got, want) {
+        b.Errorf("encoding not equal: want %v, got %v", want, got)
+    }
+}
+```
+
+Moving the `Encoder` and `Decoder` allocations out of the hot loop, removed them
+from the profile graph. There's a bit of "fun" with the Decoder to avoid it
+getting heap allocated anew for each loop when the test data is being reset.
+Having the comparison back in adds a few nanoseconds per run, but helps keep the
+code staying robust.
+
+All of this was a problem largely because I wanted to collect a nice and clean
+"before" and "after" versions of the metrics as I cleaned up and changed the
+implementation. I like seeing performance improvements. But now the results are
+incomparable, and not apples to apples, so I've cleared them away.
+
+I am happy where this has ended up though. I incorporated the `unsafe` tricks
+protocol buffers use to minimize allocations and space in their protoreflect
+package: https://github.com/protocolbuffers/protobuf-go/blob/master/reflect/protoreflect/value_unsafe_go121.go.
+
+The idea is the same, really. Be able to refer to and mutate values and fields
+efficiently, against a known schema. I'd use their implementation directly if I
+were able to, but they have it quite locked down for the same reasons why I've
+put this stuff in an internal package for the time being. I want it to be able
+to change.
+
+This is basically half of a useful article, other than the lesson about
+being certain you know what you're measuring in your benchmarks. We'll see if
+I  turn back the clock and collect clean measurements of the implementations.
diff --git a/themes/hugo-coder/assets/scss/_content.scss b/themes/hugo-coder/assets/scss/_content.scss
@@ -59,7 +59,6 @@
       // hyphens: auto;
 
       white-space: normal;
-      max-width: 60rem;
     }
   }
-Original file line number
+Diff line change
@@ Expand Up / @@ -59,7 +59,6 @@ @@
           // hyphens: auto;
           white-space: normal;
-          max-width: 60rem;
         }
       }
@@ Expand Down @@