Batcher implementation that has no opinions about chains, and columnar. #626

frankmcsherry · 2025-07-15T13:52:24Z

This PR provides a new Batcher implementation that is quite simple, though not as lean as the existing merge batcher. It relies on containers that implement two methods: merge and split, for merging two containers and for splitting a container based on a time frontier. It then handles the LSM details using these implementations.

It also dramatically revises the columnar.rs example to be containers end-to-end, with the same type on the wire, in batchers, and ultimately backing arrangements. The performance of the revised example improves over the spines.rs example, but this could be attributed to better on-the-wire performance, or a new batcher that gains in performance at the expense of higher peak memory utilization.

Both of these changes are net new, and don't break any existing tools. They should be safe to merge, as long as we are ok relocating or removing them if we find them lacking. There is also a light and weird change to the ord_neu::key storage, where we need to have an actual instance of some specified value type, which has been () in examples so far. Something about how the types work end up making an assumed &() incompatible with a container, if I recall correctly. It also seems mostly harmless, and backward compatible.

antiguru · 2025-07-23T10:55:19Z

differential-dataflow/examples/columnar.rs

+            fn partition<I>(container: &mut Self::Container, builders: &mut [Self], mut index: I)
+            where
+                Self: for<'a> PushInto<<Self::Container as timely::Container>::Item<'a>>,
+                I: for<'a> FnMut(&<Self::Container as timely::Container>::Item<'a>) -> usize,
+            {
+                println!("Exchanging!");
+                for datum in container.drain() {
+                    let index = index(&datum);
+                    builders[index].push_into(datum);
+                }
+                container.clear();
            }
+


We might have to revisit how to exchange containers that cannot be drained/iterated here. The implementation doesn't actually work because container.drain() is unimplemented.

Totally. I put it in only to be able to see if it panicked, because I wasn't certain what would happen with the default implementation (pretty sure it would panic too, but copy/pasted to be sure). It's definitely within the PR's power to have an implementation that works, and this is not it. :D

antiguru · 2025-07-23T10:59:45Z

differential-dataflow/examples/columnar.rs

+                        std::cmp::Ordering::Less => {
+                            let lower = this_key_range.start;
+                            gallop(this.keys.borrow(), &mut this_key_range, |x| x < that_key);
+                            merged.extend_from_keys(&this, lower .. this_key_range.start);


(here and below) extend_from_keys/_vals calls extend_from_self, but we don't (or can't!) presize the merged container. Pointing out that this is a potential performance regression if we have to reallocate often.
As we can't know the final size ahead of time, we could allocate a container big enough to hold the sum of the two inputs. In the worst case, this would only waste virtual memory I think.

But, let's first measure and see whether it shows up.

Elsewhere (datatoad) the presizing makes sense and seems to help a bit. One way to view merging and consolidation is as literally merging the tuples (no consolidation), for which the capacities could just be the sums of the capacities for each layer, followed by a consolidation pass that ends up filtering tuples out (ones that end up as zero), for which .. we may or may not feel bad about the over allocation. In a demand-paging world, I wouldn't feel too bad. In a world where capacities are limited, even if you don't use the data, more complicated.

frankmcsherry · 2025-09-23T22:55:30Z

I'm going to merge this on the principle that while the parts may not be final, they are not wrong and we can iterate on them in the repo.

frankmcsherry force-pushed the unchained branch 4 times, most recently from f33488a to e2e49ec Compare July 18, 2025 00:38

frankmcsherry force-pushed the unchained branch from e2e49ec to aaa2103 Compare August 1, 2025 18:46

antiguru force-pushed the unchained branch from aaa2103 to d6582e3 Compare August 15, 2025 13:27

antiguru approved these changes Aug 15, 2025

View reviewed changes

WIP

76c6cf3

frankmcsherry force-pushed the unchained branch from d6582e3 to 76c6cf3 Compare September 15, 2025 22:27

Implement Distributor

341704b

frankmcsherry marked this pull request as ready for review September 23, 2025 22:50

frankmcsherry changed the title ~~WIP: Batcher implementation that has no opinions about chains, and columnar.~~ Batcher implementation that has no opinions about chains, and columnar. Sep 23, 2025

frankmcsherry merged commit f2a3df6 into TimelyDataflow:master Sep 23, 2025
5 checks passed

github-actions bot mentioned this pull request Sep 23, 2025

chore: release v0.18.0 #648

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batcher implementation that has no opinions about chains, and columnar. #626

Batcher implementation that has no opinions about chains, and columnar. #626

frankmcsherry commented Jul 15, 2025 •

edited

Loading

Uh oh!

antiguru Jul 23, 2025

Uh oh!

frankmcsherry Aug 15, 2025

Uh oh!

antiguru Jul 23, 2025

Uh oh!

frankmcsherry Aug 15, 2025

Uh oh!

frankmcsherry commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

Batcher implementation that has no opinions about chains, and columnar. #626

Batcher implementation that has no opinions about chains, and columnar. #626

Conversation

frankmcsherry commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antiguru Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

frankmcsherry Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

antiguru Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

frankmcsherry Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

frankmcsherry commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

frankmcsherry commented Jul 15, 2025 •

edited

Loading