-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
83 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
title: "Split It Out" | ||
date: 2025-01-01T21:00:00-07:00 | ||
tags: | ||
- generics | ||
- go | ||
- beam | ||
categories: | ||
- Dev | ||
- Talks | ||
- Hobby SDK | ||
--- | ||
|
||
Since [my last post](/2024/10/my-hobby-beam-go-sdk) I've been quite busy with | ||
travel and recovery from same, and thus I haven't poked around trying to work on | ||
my Hobby SDK. | ||
|
||
Which also means, I haven't spoken about more widely either. Just after I | ||
re-built the site to use Github actions too, to make it easier to write more. | ||
Ah well. Still need to sort out a few CSS things though, since line wrapping is | ||
weird... I want to have the markdown look good, but not have that affect the | ||
rendered output beyond formating... | ||
|
||
That's for a later post. This is about how to deal with Stateful DoFns! | ||
|
||
----- | ||
|
||
The next thing to do for the Hobby SDK is add State and Timers, and ultimately, | ||
stateful transforms. I was stuck on this for a little bit. Specifically how | ||
to add these to the graph construction mechanisms, while also avoiding writing | ||
too much code to replicate the type system. | ||
|
||
The trick though is that Stateful DoFns, that is, those that make use State and | ||
Timers in any capacity, have a hard restriction: they must take in KVs as an | ||
input. This is how the Beam model enables stateful stream processing, without | ||
sacrificing parallelism. | ||
|
||
The question then, is how do I have the hobby SDK enforce this? | ||
|
||
You see, in most Beam SDKs, there is really only a single `ParDo` call as a part | ||
of graph construction, and it will accept `DoFns` of any kind. The simple light | ||
weight ones, to having Side Inputs, to Splittable DoFns, and of course, | ||
stateful DoFns. But then there's a later step that does the enforcement of these | ||
compatibility requirements. | ||
|
||
And that's just a lot of code. Unlike in Java, Go Generics don't erase types, | ||
nor can I have overloaded implementations to selectively validate different | ||
constraints. It could be possible to make some way of "detecting" the KV, using | ||
reflection to enforce validating that the outer type is a KV, but that goes | ||
against the ethos of this Hobby SDK. | ||
|
||
I want to have the Go Compiler do the enforcement, as much as possible. | ||
|
||
This means that I'll end up with a separate `StatefulParDo` top level graph | ||
construction function. There's a benefit to this: It can be clearly documented | ||
and provides a natural place to put *all* the documentation for stateful DoFns, | ||
instead of incredibly burdening the standard `ParDo` function. | ||
|
||
This `StatefulParDo` will use the same mechanism that `GroupByKey` requires: | ||
A DoFn that has a KV generic. This way, when folks are writing a pipeline, the | ||
Go Compiler's type checker will let them know of the problem. | ||
|
||
I'll still need to detect stateful DoFns, since those features will still be | ||
modeled onto Exported fields and field types. That should be fine though. | ||
The two methods also enable a cleaner split for documenting the incompatibility | ||
of SplittableDoFns and Stateful DoFns. Sadly, no good way to represten this | ||
constraint as a type. Fields don't meaningfully participate in the Go type | ||
system. | ||
|
||
We need this because b, a SplittableDoFn can't have State. | ||
It becomes very difficult to reason about that way due to other transforms that | ||
happen to SplittableDoFns. Afterall, the key can change while splitting elements. | ||
|
||
Sometimes, you just have to have two things that do jobs well separately, instead | ||
of trying to have a single approach for everything. | ||
|
||
-------- | ||
|
||
Writing the post isn't the same as actually finishing the work, but sometimes | ||
the best thing to do, is to split them out into separate posts and work. But | ||
the work does take longer. | ||
|
||
Just gotta do it one bit at a time. |