Explain batching, document best practices #255

hariso · 2025-03-18T11:55:33Z

@nickchomey asked us to write more docs explaining batching, and document best practices, especially with the new architecture.

nickchomey · 2025-03-19T01:09:53Z

Lovro wrote this in discord.

It's pretty simple actually - the source connector is in charge of collecting the batch and sends it to Conduit. The batch is then treated as one "unit", all messages move through the pipeline together as one (an array of messages). In the end, the whole batch is sent to the destination connector, so the destination doesn't need to do any batching on its end, since it will already receive the batch.

In contrast, the old engine (well, still the current one) pushed messages through the pipeline one by one, that's why batching on the source side doesn't make sense in that engine. The batch would just be broken up into singular messages anyway. The batching on the destination was added so that the destination could collect a batch and write them all in one go.

So, to sum up, in the old arch we collect batches on the destination, in the new arch we collect them on the source.
Keep in mind that this only improves the Conduit internals, the connectors are likely still going to be the bottlenecks, depending on how efficiently they read and write the data.
What I see from the graph is that we spend quite some time on encoding and decoding the data using the schema. If you are using built-in connectors for both the source and destination, you could take the shortcut and simply disable schemas altogether (sdk.schema.extract.key.enabled: false and sdk.schema.extract.payload.enabled: false on both connectors). This shortcut can be taken, because connectors don't use gRPC to communicate with Conduit, so they should be able to just return the data as is.

simonl2002 added this to Conduit Main Mar 18, 2025

github-project-automation bot moved this to Triage in Conduit Main Mar 18, 2025

lovromazgon moved this from Triage to Todo in Conduit Main Mar 18, 2025

lovromazgon self-assigned this Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain batching, document best practices #255

Explain batching, document best practices #255

hariso commented Mar 18, 2025

nickchomey commented Mar 19, 2025 •

edited

Loading

Explain batching, document best practices #255

Explain batching, document best practices #255

Comments

hariso commented Mar 18, 2025

nickchomey commented Mar 19, 2025 • edited Loading

nickchomey commented Mar 19, 2025 •

edited

Loading