Skip to content

Commit

Permalink
Split It out.
Browse files Browse the repository at this point in the history
  • Loading branch information
lostluck committed Jan 2, 2025
1 parent c482181 commit 48fa51c
Showing 1 changed file with 83 additions and 0 deletions.
83 changes: 83 additions & 0 deletions content/posts/2025-01-01-hobby-sdk-02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: "Split It Out"
date: 2025-01-01T21:00:00-07:00
tags:
- generics
- go
- beam
categories:
- Dev
- Talks
- Hobby SDK
---

Since [my last post](/2024/10/my-hobby-beam-go-sdk) I've been quite busy with
travel and recovery from same, and thus I haven't poked around trying to work on
my Hobby SDK.

Which also means, I haven't spoken about more widely either. Just after I
re-built the site to use Github actions too, to make it easier to write more.
Ah well. Still need to sort out a few CSS things though, since line wrapping is
weird... I want to have the markdown look good, but not have that affect the
rendered output beyond formating...

That's for a later post. This is about how to deal with Stateful DoFns!

-----

The next thing to do for the Hobby SDK is add State and Timers, and ultimately,
stateful transforms. I was stuck on this for a little bit. Specifically how
to add these to the graph construction mechanisms, while also avoiding writing
too much code to replicate the type system.

The trick though is that Stateful DoFns, that is, those that make use State and
Timers in any capacity, have a hard restriction: they must take in KVs as an
input. This is how the Beam model enables stateful stream processing, without
sacrificing parallelism.

The question then, is how do I have the hobby SDK enforce this?

You see, in most Beam SDKs, there is really only a single `ParDo` call as a part
of graph construction, and it will accept `DoFns` of any kind. The simple light
weight ones, to having Side Inputs, to Splittable DoFns, and of course,
stateful DoFns. But then there's a later step that does the enforcement of these
compatibility requirements.

And that's just a lot of code. Unlike in Java, Go Generics don't erase types,
nor can I have overloaded implementations to selectively validate different
constraints. It could be possible to make some way of "detecting" the KV, using
reflection to enforce validating that the outer type is a KV, but that goes
against the ethos of this Hobby SDK.

I want to have the Go Compiler do the enforcement, as much as possible.

This means that I'll end up with a separate `StatefulParDo` top level graph
construction function. There's a benefit to this: It can be clearly documented
and provides a natural place to put *all* the documentation for stateful DoFns,
instead of incredibly burdening the standard `ParDo` function.

This `StatefulParDo` will use the same mechanism that `GroupByKey` requires:
A DoFn that has a KV generic. This way, when folks are writing a pipeline, the
Go Compiler's type checker will let them know of the problem.

I'll still need to detect stateful DoFns, since those features will still be
modeled onto Exported fields and field types. That should be fine though.
The two methods also enable a cleaner split for documenting the incompatibility
of SplittableDoFns and Stateful DoFns. Sadly, no good way to represten this
constraint as a type. Fields don't meaningfully participate in the Go type
system.

We need this because b, a SplittableDoFn can't have State.
It becomes very difficult to reason about that way due to other transforms that
happen to SplittableDoFns. Afterall, the key can change while splitting elements.

Sometimes, you just have to have two things that do jobs well separately, instead
of trying to have a single approach for everything.

--------

Writing the post isn't the same as actually finishing the work, but sometimes
the best thing to do, is to split them out into separate posts and work. But
the work does take longer.

Just gotta do it one bit at a time.

0 comments on commit 48fa51c

Please sign in to comment.