-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimize std::transform
for vector<bool>
#5769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2251e80
to
efb5539
Compare
efb5539
to
dc9cb95
Compare
4c83d72
to
e09858c
Compare
|
||
template <class _Ty> | ||
struct _Map_vb_functor<equal_to<_Ty>> { | ||
using _Type = conditional_t<_Is_vbool_functor_arg<_Ty>, _Bit_xnor, void>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we can map to _Map_vb_functor
itself and have operator()
right here to save one struct
.
} | ||
|
||
template <class _VbIt, class _OutIt, class _Mapped_fn> | ||
_CONSTEXPR20 _OutIt _Transform_vbool_aligned( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this out to <vector>
from <algorithms>
because other algorithms are moved out.
However, I'm not sure if it is useful.
For accessing vector<bool>
representation it is not strictly necessary. Most of things are template-dependent member functions and datas. The only exception is _Vbase
, which can be still deduced from iterators.
For throughput it does not look useful either. <vector>
is more frequent than <algorithm>
so it appears more useful to off-load <vector>
instead.
For reference, 0f24d45 is the commit where this movement was made.
Towards #625, specifically #625 (comment) items 1 and 2.
🦖 Optimization
When a standard functor, either transparent or integer-specialized, is passed to
transform
, along with allvector<bool>
iterators, map that functor to a bitwise one to operate on the underlying type.The mapping is done via template specialization, and not via
if constexpr
to make the dispatch working fine without<functional>
included and functors defined.Only do this for zero offset. Supporting all possible offset combination is much complexity for a little gain. Remember
copy
.Extract pointers from iterators to help the compiler auto-vectorize. Yes, it does not auto-vectorize when using the whole iterators. Auto-vecotrization needs simplest ways of implementing loops.
Don't call
transform
again, to avoid unnecessary recursion, the operation is simple.Don't process tails explicitly, yield to the existing loop for now.Actually lets go for it, it is not that hard. Process tails with applying bit mask.
Don't do ranges yet. Other
vector<bool>
optimizations don't do them either. It is getting complicated, so instead of doing ranges separately, need to look into #1754 at last.🏁 Benchmark
Feed the randomizer with some seed to make the inputs different 🐦
Since (auto-)vectorization is (expected to be) engaged, use alignment controlling allocator.
⏱️ Benchmark results
transform_two_inputs_aligned<logical_and<>>/64
transform_two_inputs_aligned<logical_and<>>/4096
transform_two_inputs_aligned<logical_and<>>/65536
transform_two_inputs_aligned<logical_or<>>/64
transform_two_inputs_aligned<logical_or<>>/4096
transform_two_inputs_aligned<logical_or<>>/65536
transform_one_input_aligned<logical_not<>>/64
transform_one_input_aligned<logical_not<>>/4096
transform_one_input_aligned<logical_not<>>/65536