Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iobuf: replace boost::intrusive_list with hand rolled implementation #25053

Merged
merged 14 commits into from
Feb 14, 2025

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Feb 6, 2025

A number of issues have cropped up in microbenchmark reports due to how expensive it is to move an iobuf. This has occurred at least three times to my knowledge:

Let's fix the underlying problem and remove the usage of boost::intrusive_list from iobuf. The new handrolled implementation improves performance of the iceberg translation path by ~60% in one microbenchmark (with 35% less instructions). This benchmark heavily uses iobuf (being all the messages are smallish strings that get moved around a bunch).

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 6, 2025

CI test results

test results on build#61657
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61657#0194da3c-f45f-4f0a-bcff-b3a72ab0a262 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61657#0194da3c-f45d-432a-95b0-1ab1ad472c1b FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61657#0194da3c-f45f-4f0a-bcff-b3a72ab0a262 FLAKY 1/2
test results on build#61705
test_id test_kind job_url test_status passed
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61705#0194de61-5d8d-4f64-9c4e-f51f02130821 FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/61705#0194de61-5d8a-4de0-a3a9-a66dfd74e92a FLAKY 1/2
rptest.tests.partition_balancer_test.PartitionBalancerTest.test_unavailable_nodes ducktape https://buildkite.com/redpanda/redpanda/builds/61705#0194de61-5d8a-4de0-a3a9-a66dfd74e92a FLAKY 1/2
test results on build#61718
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61718#0194df7d-96c5-4c20-8158-e0cadba71d66 FLAKY 1/3
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61718#0194df7d-96c6-4d1f-9364-03679104110c FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61718#0194df80-74c7-4329-9ba0-7d248ee0e542 FLAKY 1/2
rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=0.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61718#0194df80-74c7-4329-9ba0-7d248ee0e542 FLAKY 1/2
test results on build#61755
test_id test_kind job_url test_status passed
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/61755#0194e3b7-5627-4148-b377-9e9610153256 FLAKY 1/2

@rockwotj
Copy link
Contributor Author

rockwotj commented Feb 6, 2025

Before

test iterations median mad min max allocs tasks inst
iobuf.move_bench_small 183288000 4.961ns 0.004ns 4.954ns 4.965ns 0.000 0.000 43.7
iobuf.move_bench_medium 1664000 5.111ns 0.016ns 5.093ns 5.167ns 0.000 0.000 43.7
iobuf.move_bench_large 520000 5.110ns 0.022ns 5.088ns 5.181ns 0.000 0.000 43.7

After

test iterations median mad min max allocs tasks inst
iobuf.move_bench_small 376274000 2.201ns 0.001ns 2.200ns 2.219ns 0.000 0.000 7.7
iobuf.move_bench_medium 1660000 2.331ns 0.004ns 2.327ns 2.378ns 0.000 0.000 7.7
iobuf.move_bench_large 520000 2.353ns 0.003ns 2.332ns 2.365ns 0.000 0.000 7.7

After (updated)

test iterations median mad min max allocs tasks inst
iobuf.move_bench_small 520848000 1.469ns 0.000ns 1.468ns 1.471ns 0.000 0.000 7.2
iobuf.move_bench_medium 1694000 1.582ns 0.002ns 1.579ns 1.586ns 0.000 0.000 7.2
iobuf.move_bench_large 528000 1.599ns 0.009ns 1.590ns 1.614ns 0.000 0.000 7.2

@rockwotj rockwotj force-pushed the iobuf_move branch 3 times, most recently from 7489680 to 379c910 Compare February 6, 2025 18:06
@rockwotj rockwotj marked this pull request as draft February 6, 2025 19:01
@rockwotj rockwotj force-pushed the iobuf_move branch 4 times, most recently from f990a32 to 27b804b Compare February 6, 2025 20:58
@rockwotj
Copy link
Contributor Author

rockwotj commented Feb 6, 2025

I reverted c743388 from the discussion the other day, the new performance numbers look 😎

== no unique ptr boost intrusive list

    test                                                                         iterations      median         mad         min         max      allocs       tasks        inst      cycles
    record_multiplexer_bench_fixture.protobuf_381_byte_message_linear_80_fields     3261000   222.723ns     0.047ns   222.283ns   223.455ns       1.110       0.003      1779.8         0.0

== unique ptr boost intrusive list

    test                                                                         iterations      median         mad         min         max      allocs       tasks        inst      cycles
    record_multiplexer_bench_fixture.protobuf_381_byte_message_linear_80_fields     3261000   204.218ns     0.044ns   204.174ns   205.159ns       1.355       0.002      1596.1         0.0

== no unique ptr and hand rolled intrusive list

    test                                                                         iterations      median         mad         min         max      allocs       tasks        
record_multiplexer_bench_fixture.protobuf_381_byte_message_linear_80_fields     3261000   136.883ns     0.280ns   136.603ns   137.548ns       1.110       0.002      1303.3         0.0

@rockwotj rockwotj marked this pull request as ready for review February 6, 2025 21:17
@rockwotj rockwotj requested review from StephanDollberg, dotnwat and a team February 6, 2025 21:20
@rockwotj rockwotj changed the title iobuf: benchmark for the move ctor iobuf: replace boost::intrusive_list with hand rolled implementation Feb 6, 2025

public:
io_fragment_list() noexcept = default;
io_fragment_list(io_fragment_list&&) noexcept = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • don't we need to reset _head/_tail on the moved-from to satisfy the assertion in the destructor?

  • also i wonder if there is any code in the tree that assumes a moved-from iobuf is empty, which the default ctor wouldn't do right?

maybe i'm missing something...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only safe because iobuf "fixes" it:

x._frags = container{};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh of course, this is replacing container_t, right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be preferable in any way to push that down into the list's move constructor? genuine question, I don't think I have an opinion. probably roughly (or exactly) the same code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have that here, local. The problem is I don't know how to do that for move assignment, because this is a non owning container, I can't do anything but assert that the container is empty before.

Good news we don't use move assignment, so I can just delete it and handle move ctor here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this was a great suggestion, it's driven the benchmarks down even further:

test iterations median mad min max allocs tasks inst
iobuf.move_bench_small 520848000 1.469ns 0.000ns 1.468ns 1.471ns 0.000 0.000 7.2
iobuf.move_bench_medium 1694000 1.582ns 0.002ns 1.579ns 1.586ns 0.000 0.000 7.2
iobuf.move_bench_large 528000 1.599ns 0.009ns 1.590ns 1.614ns 0.000 0.000 7.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha sweet 🎉

Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

just a few nitpicks on first pass. have not reviewed all the code yet.

src/v/bytes/tests/iobuf_bench.cc Outdated Show resolved Hide resolved
src/v/bytes/iobuf.h Outdated Show resolved Hide resolved
src/v/bytes/tests/iobuf_tests_mt.cc Show resolved Hide resolved
src/v/bytes/details/io_fragment.h Show resolved Hide resolved
src/v/base/vassert.h Outdated Show resolved Hide resolved
src/v/bytes/details/io_fragment.h Outdated Show resolved Hide resolved
src/v/iceberg/avro_utils.h Show resolved Hide resolved
src/v/json/iobuf_writer.h Show resolved Hide resolved
src/v/bytes/details/io_fragment.h Show resolved Hide resolved
src/v/bytes/tests/iobuf_fragment_list_test.cc Show resolved Hide resolved
src/v/bytes/details/io_fragment.h Show resolved Hide resolved
src/v/bytes/details/io_fragment.h Outdated Show resolved Hide resolved
src/v/bytes/details/io_fragment.h Outdated Show resolved Hide resolved
@@ -73,31 +76,33 @@ class generic_iobuf_writer
// 3. Drop the final character
// For each encoded fragment that is written (except the first one):
// 4. Restore the stashed character over the prefix-quote
for (auto i = beg; i != end; ++i) {
for (auto i = beg;;) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR pretty much LGTM, except that I had/am having a hard time convincing myself that this change here is equivalent, and I'm not sure I follow why the restructuring is needed vs just swapping out the last_frag() implementation (also ignoring the early return, which seems fine).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't compare i to last anymore because they are different iterator types now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh makes sense. Is it generally a bad practice to allow comparing different iterator types? Like could we templatize operator== to support different iterators and just compare their _current (or maybe the template math doesn't math)?

In any case, I stared at it more and yea it does seem equivalent, so lgtm!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have implicit conversion between non-const to const iterator (standard), but I don't think it's generally allowed to go from forward to backward iterator mostly because I don't know how to map the end states in a coherent way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I can buy not being able to convert. I'm wondering if implementing a comparator would work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean semantically how to compare end vs rend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general C++ doesn't allow comparison of iterators and corresponding reverse iterators, probably for good reasons (reverse iterators are weirder than you might think since you cannot use one-before-the-front in the same way you can use one-past-the-end), so I don't think we should either?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean semantically how to compare end vs rend?

Ah, I would've thought something like, end iters aren't equal if their Forward and containers aren't equal.

In any case, not implementing a comparator sgtm -- it does seem like a can of worms, especially if it's not a common practice

Just inline the implementations
We need to correctly handle empty string, as it stands today, an empty
iobuf will end up in invalid JSON.

While we are here also make the algorithm here only need
forward_iterator from iobuf instead of bidirectional_iterator
This is not required for the avro interface here, we only need to be
able to return data from the last successful call to `next`, and since
we still have a pointer to the last fragment, we can restore the data
in that fragment.
We can use the reserve iterator stuff here.
To simplify the implementation and remove the need for the iterator to
have both the container and the current node, to just be the current
node.
This grabs a `const iobuf&`, which is then destroyed after the iterator
is created. I think because the underlying buffers were shared, it was
kind of fine, but certainly is sketch.
@@ -73,31 +76,33 @@ class generic_iobuf_writer
// 3. Drop the final character
// For each encoded fragment that is written (except the first one):
// 4. Restore the stashed character over the prefix-quote
for (auto i = beg; i != end; ++i) {
for (auto i = beg;;) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh makes sense. Is it generally a bad practice to allow comparing different iterator types? Like could we templatize operator== to support different iterators and just compare their _current (or maybe the template math doesn't math)?

In any case, I stared at it more and yea it does seem equivalent, so lgtm!

@rockwotj rockwotj merged commit 59ee571 into redpanda-data:dev Feb 14, 2025
17 checks passed
@rockwotj rockwotj deleted the iobuf_move branch February 14, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants