Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy Avoidance #355

Open
adsharma opened this issue Feb 17, 2017 · 11 comments
Open

Copy Avoidance #355

adsharma opened this issue Feb 17, 2017 · 11 comments

Comments

@adsharma
Copy link

After digging into the micro benchmark linked here:

facebookarchive/iterlib#25

and looking at some profiles, I suspect that there is a copy going on between the observable -> skip -> take. Is there a way to avoid the copy in this code?

auto values = rxcpp::observable<>::iterate(vec).skip(size/2).take(size/4);
@adsharma
Copy link
Author

I wonder if it's feasible to use iterator_traits::reference instead of value_type here:

https://github.com/Reactive-Extensions/RxCpp/blob/master/Rx/v2/src/rxcpp/sources/rx-iterate.hpp#L159-L172

@kirkshoop
Copy link
Member

Hi!

rxcpp attempts to ensure minimal copies. Bugs fixes for excessive copies are welcome!

In this case, iterate is intentionally taking a copy of the vector when constructed and then copying each item during iteration.

rxcpp is async by default. this means that iterate takes the vector by value so that a copy (or move) of the vector is invoked, so that when subscribe is called later, the data is still valid.

the vector copy would be avoided by passing a non-owning range to iterate.

iterate is cold. this means that subscribe can be called multiple times and should return the full set of values each time and that those values should not have been mutated.

I think that the per value copy could be removed using iterator_traits::const_reference to prevent mutation. perhaps you could provide a PR for that and we can test to see if there is any negative affect.

Thanks!

@adsharma
Copy link
Author

the vector copy would be avoided by passing a non-owning range to iterate.

Thanks for the suggestion.

-    auto values = rxcpp::observable<>::iterate(vec).skip(size/2).take(size/4);
+    std::vector<ItemOptimized> sliced = vec | ranges::view::slice(size/2, size/2 + size/4);
+    auto values = rxcpp::observable<>::iterate(sliced);

speeds up things quite a bit by using range-v3. I would like to be able to write:

-    auto values = rxcpp::observable<>::iterate(vec).skip(size/2).take(size/4);
+    auto sliced = vec | ranges::view::slice(size/2, size/2 + size/4);
+    auto values = rxcpp::observable<>::iterate(sliced);

But the code doesn't compile. Is there a way to make a range-v3 view a source for rxcpp::observable without forcing a copy to a vector followed by iterate()? I suspect a new source would've to be written.

Re: PR for const_reference

I wrote some code along the following lines, but it didn't quite work. I suspect more extensive changes are needed.

const_ref.txt

@kirkshoop
Copy link
Member

Hi,

rxcpp is not a replacement for range-v3. if the data is already in a vector, then range-v3 is the right tool. if the values are arriving later, then rxcpp is the right tool.

The changes for const ref:

looks like iterate_traits::reference was defined and then iterate_traits::const_reference was referenced.
I expect problems if the value_type of the observable is changed to be a reference. The use of the reference should be isolated to the parameter passed to on_next

I would expect this to work:

auto sliced = vec | ranges::view::slice(size/2, size/2 + size/4);
auto values = rxcpp::observable<>::iterate(sliced);

What was the error?

@adsharma
Copy link
Author

The reason why rxcpp::observable<>::iterate(sliced) errors out for me is that include/range/v3/utility/iterator_traits.hpp in range-v3 doesn't define the value_type that rx-util.hpp expects for container types.

I'll take another look at isolating the const_ref changes to on_next. Thanks!

@adsharma
Copy link
Author

Forgot to add the actual error message:

/usr/local/rxcpp/rx-util.hpp:35:80: error: no type named ‘value_type’ in ‘std::decay<std::iterator_traits<__gnu_cxx::__normal_iterator<iterlib::ItemOptimized*, std::vector<iterlib::ItemOptimized> >&> >::type {aka struct std::iterator_traits<__gnu_cxx::__normal_iterator<iterlib::ItemOptimized*, std::vector<iterlib::ItemOptimized> >&>}’
 template<class T> using value_type_t = typename std::decay<T>::type::value_type;

@kirkshoop
Copy link
Member

it looks like there is an errant & in the type of the iterator.

perhaps you could try this edit in rx-iterate.hpp and see if that fixes it?

template<class Collection>
struct iterate_traits
{
    typedef rxu::decay_t<Collection> collection_type;
-    typedef decltype(std::begin(*(collection_type*)nullptr)) iterator_type;
+    typedef rxu::decay_t<decltype(std::begin(*(collection_type*)nullptr))> iterator_type;
    typedef rxu::value_type_t<std::iterator_traits<iterator_type>> value_type;
};

@adsharma
Copy link
Author

That change does fix it. Thanks!

Re: Using range-v3 for vectors and rxcpp for streaming

Things get a bit complicated when a common optimization called "microbatching" is applied. The idea is to delay processing for efficiency. It'd be great if the end user writing the logic doesn't have to worry about the optimizations the stream processing system is performing under the covers.

One way this could be handled is to pass a vector to on_next() by const reference.

@kirkshoop
Copy link
Member

One way this could be handled is to pass a vector to on_next() by const reference.

are you referring to the buffer operators?

kirkshoop pushed a commit to kirkshoop/RxCpp that referenced this issue Feb 23, 2017
ReactiveX#355 reported that range-v3 ranges failed due to an errant `&` in the
`iterator_type`
gchudnov pushed a commit that referenced this issue Feb 23, 2017
#355 reported that range-v3 ranges failed due to an errant `&` in the
`iterator_type`
@adsharma
Copy link
Author

adsharma commented Feb 28, 2017

are you referring to the buffer operators?

Yes. Glad to know rxcpp already takes care of it. I was looking into the implementation here:

https://github.com/Reactive-Extensions/RxCpp/blob/master/Rx/v2/src/rxcpp/operators/rx-buffer_count.hpp#L92

and was wondering if we can avoid the copy in deque/chunks.

One technique iterlib uses for these types of things is to use pointers. Example usage:

https://github.com/facebookincubator/iterlib/blob/master/include/iterlib/OrderByIterator.h#L89

But in order for this to work, the underlying container has to ensure that the pointer remains valid until the item is consumed. std::vector doesn't have this property, which is why we use lists here:

https://github.com/facebookincubator/iterlib/blob/master/include/iterlib/RocksDBIterator.h#L52-L60

Thanks for looking into this and merging the fix.

@kirkshoop
Copy link
Member

Hi! I am finally back after a trip.

wondering if we can avoid the copy in deque/chunks.

When the time or count windows are overlapping the copies are required (the same value is part of multiple vectors).

I think your best solution would be a new set of operators - view_. . .
These would store a single deque (even for overlapping windows) and use the soon-to-be-std view(?) type to pass out (potentially overlapping) views into the deque rather than copies. Each value would only be removed when all the views that reference it are emitted.

views of a list of shared_ptr would prevent copies, but I would expect that solution to die in allocator and CPU cache thrashing. often copies of vectors are faster than processing of lists with the same data.

views on a deque would have to be deep copied to a vector if they needed to retained after the stack unwound to the view_. . . that owns the deque, but C++ is pay-for-what-you-need, so it would only add overhead when that overhead is explicitly requested.

Another approach would be for the IO case. making a buffer pool with an owning reference type or a copy-on-write buffer type would allow chunks of data to be efficiently passed by value (copy or move) through all the rxcpp operators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants