Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examples/readme_examples.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ std::basic_string<ToChar> transcode_or_throw(std::basic_string_view<FromChar> in
return result;
}

#if 0
// Deliberately broken by double-transcode optimization in the CPO
template <typename I, typename O>
using transcode_result = std::ranges::in_out_result<I, O>;

Expand All @@ -124,6 +126,7 @@ bool transcode_to_utf32_test() {
}
return true;
}
#endif

#ifndef _MSC_VER
enum class suit : std::uint8_t {
Expand Down Expand Up @@ -257,9 +260,6 @@ bool readme_examples() {
return false;
}
}
if (!transcode_to_utf32_test()) {
return false;
}
if (!change_playing_card_suit_test()) {
return false;
}
Expand Down
25 changes: 24 additions & 1 deletion include/beman/utf_view/to_utf_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,9 @@ struct to_utf_view<V, E, ToType>::exposition_only_iterator : detail::iter_catego
/* PAPER: using iterator_concept = @*see below*@; */
/* PAPER: using iterator_category = @*see below*@; // @*not always present*@ */
/* !PAPER */

using is_to_utf_view_iterator = void;

using iterator_concept = decltype(iter_concept_impl());
/* PAPER */
using value_type =
Expand Down Expand Up @@ -855,6 +858,19 @@ struct to_utf_view<V, E, ToType>::exposition_only_sentinel {

namespace detail {

template <class T>
inline constexpr bool is_to_utf_view_v = false;

template <class R, auto E, class Tag>
inline constexpr bool is_to_utf_view_v<to_utf_view<R, E, Tag>> = true;

template <class T>
inline constexpr bool is_to_utf_subrange_v = false;

template <class I>
inline constexpr bool is_to_utf_subrange_v<std::ranges::subrange<I, I, std::ranges::subrange_kind::unsized>> =
requires { typename I::is_to_utf_view_iterator; };

template <to_utf_view_error_kind E, exposition_only_code_unit ToType>
struct to_utf_impl : std::ranges::range_adaptor_closure<to_utf_impl<E, ToType>> {
template <std::ranges::range R>
Expand All @@ -867,7 +883,14 @@ namespace detail {
} else {
return std::ranges::empty_view<std::expected<ToType, utf_transcoding_error>>{};
}
} else {
} else if constexpr (detail::is_to_utf_view_v<T>) {
return to_utf_view(std::forward<R>(r).base(), detail::nontype<E>, to_utf_tag<ToType>);
} else if constexpr (detail::is_to_utf_subrange_v<T>) {
return to_utf_view(
std::ranges::subrange(r.begin().base(), r.end().base()),
detail::nontype<E>,
to_utf_tag<ToType>);
} else {
return to_utf_view(std::forward<R>(r), detail::nontype<E>, to_utf_tag<ToType>);
}
}
Expand Down
171 changes: 16 additions & 155 deletions paper/P2728.md
Original file line number Diff line number Diff line change
Expand Up @@ -470,6 +470,14 @@ expression-equivalent to:

- Otherwise, `empty_view<expected<Char, utf_trancoding_error>>{}`.

- Otherwise, if the type of `E` is a (possibly cv-qualified) specialization of
`to_utf_view`, then `to_utf_view(E.base(), nontype<Error>, to_utf_tag<Char>)`.

- Otherwise, if the type of `E` is *cv* `subrange<to_utf_view::@*iterator*@,
to_utf_view::@*iterator*@, subrange_kind::unsized>` for some specialization of
`to_utf_view`, then `to_utf_view(subrange(E.begin().base(), E.end().base()),
nontype<Error>, to_utf_tag<Char>)`.

- Otherwise, `to_utf_view(E, nontype<Error>, to_utf_tag<Char>)`.

#### 24.7.?.2 Enumeration `utf_transcoding_error` [range.transcoding.error.transcoding] {-}
Expand Down Expand Up @@ -1032,162 +1040,14 @@ int main(int, char const* argv[]) {
}
```

In the above example, if the user is building on Windows, `foo` will create a
`to_utf16_view` wrapping a `to_utf32_view`.

### Eliding the Inner Transcoding View in the CPO

You might want to add logic in the CPO such that it notices that `foo` is creating a
`to_utf16_view` wrapping a `to_utf32_view`, elides the `to_utf16_view`, and creates the
`to_utf32_view` directly wrapping the view produced by `as_char8_t`.

However, this runs into issues where the result of `base()` isn't what the user
expects. Consider this transcode function that works similarly to `std::ranges::copy`, in
that it returns both the output iterator and the final position of the input iterator:

```c++
template <typename I, typename O>
using transcode_result = std::ranges::in_out_result<I, O>;

template <std::input_iterator I, std::sentinel_for<I> S, std::output_iterator<char8_t> O>
transcode_result<I, O> transcode_to_utf32(I first, S last, O out) {
auto r = std::ranges::subrange(first, last) | to_utf32;

auto copy_result = std::ranges::copy(r, out);

return transcode_result<I, O>{copy_result.in.base(), copy_result.out};
}
```

if `copy_result.in.base()` is a different type than `first`, this will break.

### "Innermost" Optimization

Instead, the iterator of the transcoding view can "look through" the iterator of the inner
transcoding view that it's wrapping. Since the iterator is just a backpointer to the
parent and an iterator to the current position, optimizing like this instead points the
backpointer to its parent's parent, and uses the inner iterator of the iterator it's
wrapping for the current position. We use exposition-only concepts named
`@*innermost-parent*@` and `@*innermost-base*@` to explicate how this works in the
wording.

The wording change that would enable this optimization is as follows:

#### Additional helper templates

```diff
+ template<class T>
+ concept @*to-utf-view-iterator-optimizable*@ = @*unspecified*@ // @*exposition only*@
+ template<class T>
+ concept @*to-utf-view-sentinel-optimizable*@ = @*unspecified*@ // @*exposition only*@
```
In the above example, naively, `foo` would create a `to_utf16_view` wrapping a
`to_utf32_view`. However, the `to_utfN` CPOs detect this situation and elide the
`to_utf32_view`, creating the `to_utf16_view` so that it directly wraps the view produced
by `as_char8_t`.

These concepts are true when the type in question is the iterator/sentinel of a transcoding view.

#### Class template `to_utf_view`

```diff
- using @*Parent*@ = @*maybe-const*@<Const, to_utf_view>; // @*exposition only*@
- using @*Base*@ = @*maybe-const*@<Const, V>; // @*exposition only*@
+ using @*innermost-parent*@ = @*unspecified*@ // @*exposition only*@
+ using @*innermost-base*@ = @*unspecified*@ // @*exposition only*@
+ static constexpr bool @*optimizing*@{@*to-utf-view-iterator-optimizable*@<iterator_t<@*Base*@>>
```

#### Class `to_utf_view::@*iterator*@`

```diff
- iterator_t<@*Base*@> current_ = iterator_t<@*Base*@>(); // @*exposition only*@
- @*Parent*@* parent_ = nullptr; // @*exposition only*@
+
+ iterator_t<@*innermost-base*@> current_ = iterator_t<@*innermost-base*@>(); // @*exposition only*@
+ @*innermost-parent*@* parent_ = nullptr; // @*exposition only*@
```

```diff
- constexpr @*iterator*@(@*Parent*@& parent, iterator_t<@*Base*@> begin)
- : current_(std::move(begin)),
- parent_(addressof(parent)) {
- if (base() != @*end*@())
- @*read*@();
- else if constexpr (!forward_range<@*Base*@>) {
- buf_index_ = -1;
- }
- }
+
+ constexpr @*iterator*@(@*innermost-parent*@& parent, iterator_t<@*innermost-base*@> begin)
+ : current_(std::move(begin)),
+ parent_(std::addressof(parent))
+ {
+ if (current_ != @*end*@())
+ @*read*@();
+ else if constexpr (!forward_range<@*Base*@>) {
+ buf_index_ = -1;
+ }
+ }
+
+ constexpr @*iterator*@(@*Parent*@& parent, iterator_t<@*Base*@> begin) requires @*optimizing*@
+ : current_(std::move(begin.current_)), parent_(begin.parent_) {
+ if (current_ != @*end*@())
+ @*read*@();
+ else if constexpr (!forward_range<@*Base*@>) {
+ buf_index_ = -1;
+ }
+ }
```

```diff
- constexpr const iterator_t<@*Base*@>& base() const& noexcept { return current_; }
+ constexpr iterator_t<@*Base*@> base() const& noexcept requires forward_range<@*Base*@>
+ {
+ if constexpr (@*optimizing*@) {
+ return iterator_t<@*Base*@>{*parent_, current_};
+ } else {
+ return current_;
+ }
+ }

- constexpr iterator_t<@*Base*@> base() && { return std::move(current_); }
+ constexpr iterator_t<@*Base*@> base() && {
+ if constexpr (@*optimizing*@) {
+ return iterator_t<@*Base*@>{*parent_, std::move(current_)};
+ } else {
+ return std::move(current_);
+ }
+ }
```

```diff
- constexpr sentinel_t<@*Base*@> @*end*@() const { // @*exposition only*/
+ constexpr sentinel_t<@*innermost-base*@> @*end*@() const { // @*exposition only*/
return end(parent_->base_);
}
```

#### Class `to_utf_view::@*sentinel*@`

```diff
- using @*Parent*@ = @*maybe-const*@<Const, to_utf_view>; // @*exposition only*@
- using @*Base*@ = @*maybe-const*@<Const, V>; // @*exposition only*@
- sentinel_t<@*Base*@> end_ = sentinel_t<@*Base*@>();
+
+ using @*innermost-parent*@ = @*unspecified*@ // @*exposition only*@
+ using @*innermost-base*@ = @*unspecified*@ // @*exposition only*@
+ sentinel_t<@*innermost-base*@> end_ = sentinel_t<@*innermost-base*@>();
+ static constexpr bool @*optimizing*@{@*to-utf-view-sentinel-optimizable*@<sentinel_t<@*Base*@>>};
```

```diff
+ constexpr explicit @*sentinel*@(sentinel_t<@*Base*@> end) requires @*optimizing*@
+ : end_{end.end_} {}
```

```diff
+ constexpr sentinel_t<@*Base*@> base() const requires @*optimizing*@
+ {
+ return sentinel_t<@*Base*@>{end_};
+ }
```
There's precedent for this kind of approach in the `views::reverse` CPO, which simply
gives back the original underlying view if it detects that it's reversing another
`reverse_view`.

# Changelog

Expand All @@ -1205,6 +1065,7 @@ These concepts are true when the type in question is the iterator/sentinel of a
`views::adjacent_transform<N>` and elsewhere
- Don't cache `begin()`
- Reject arrays of `charN_t` in the CPOs
- Implement double-transcode optimization in the CPOs

## Changes since R8

Expand Down