Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 99 additions & 23 deletions paper/P2728.md
Original file line number Diff line number Diff line change
Expand Up @@ -1230,38 +1230,114 @@ case, we expect users to simply spell `views::to_input | views::to_utf8`.

## Why There Are Three `to_utfN_view`s and No `to_utf_view`

The views in `std::ranges` are constrained to accept only `std::ranges::view`
template parameters. However, they accept `std::ranges::viewable_range`s in
practice, because they each have a deduction guide that looks like this:
This section starts with an simplified, idealized, but unimplementable design, and works
backwards from there to various hypothetical alternatives, including the design that's
proposed in the current revision.

### `to_utf_view` with Unary Constructor

Imagine we had a single `to_utf_view`, and users specified which encoding to transcode
to via template parameter:

```c++
std::u32string transcode_to_utf8(const std::u8string& str) {
return std::ranges::to_utf_view<char32_t>(str) | std::ranges::to<std::u32string>();
}
```

Why doesn't this work?

Well, in this scenario, `to_utf_view` would have two template parameters, one for the
`ToType` and one for the underlying view:

```c++
template<@*code-unit*@ ToType, input_range V>
requires view<V> && @*code-unit*@<range_value_t<V>>
class to_utf_view {
// ...
```

Spelling the constructor invocation as `std::ranges::to_utf_view<char32_t>(str)` doesn't
work, because CTAD is all-or-nothing; you can't specify the `ToType` explicitly and still
deduce the `input_range V`.

### `to_utf_view` with Tag Type Constructor

One alternative would be to have CTAD deduce the `charN_t` template parameter from the
parameters of the constructor using some kind of tag:

```c++
std::u32string transcode_to_utf8(const std::u8string& str) {
return std::ranges::to_utf_view(str, std::ranges::utf_tag<char32_t>{})
| std::ranges::to<std::u32string>();
}
```

This is a viable alternative to the status quo.

But let's revisit the unary constructor approach.

### `to_utf_view` with Unary Constructor and `to_utfN_view` Views as Type Aliases

Let's try keeping `std::ranges::to_utf_view`'s unary constructor from before, and then
we'll add `to_utf8_view`, `to_utf16_view`, and `to_utf32_view` as type aliases of
`to_utf_view`:

```c++
template <class V>
using to_utf8_view = to_utf_view<char8_t, V>;

template <class V>
using to_utf16_view = to_utf_view<char16_t, V>;

template <class V>
using to_utf32_view = to_utf_view<char32_t, V>;
```

Now let me fill in some additional background on how CTAD works for views. All views in
the standard have a user-defined deduction guide that ensures that when a range is passed
to the contructor of a view, it gets wrapped in `views::all_t`, e.g.:

```c++
template<class R>
to_utf8_view(R &&) -> to_utf8_view<views::all_t<R>>;
explicit join_view(R&&) -> join_view<views::all_t<R>>;
```

An alternative design is possible where the `to_utfN_views` are defined in terms of a `to_utf_view` with a format NTTP, as was done in a previous version of this paper:
Miraculously, thanks to [@P1814R0], we could write a deduction guide like the following,
and all of the `to_utfN_view` aliases specified above would just work:

```cpp
template<format Format, class R>
to_utf_view(R &&) -> to_utf_view<Format, views::all_t<R>>;

template<class V>
using to_utf8_view = to_utf_view<format::utf8, V>;
template<class V>
using to_utf16_view = to_utf_view<format::utf16, V>;
template<class V>
using to_utf32_view = to_utf_view<format::utf32, V>;
```c++
template<@*code-unit*@ ToType, class R>
to_utf_view(R&&) -> to_utf_view<ToType, views::all_t<R>>;
```

Although [@P1814R0] would make these guides work perfectly well for
`to_utf8_view` and its siblings, it's not actually possible to make use of the
deduction guide for `to_utf_view` without going through one of those
aliases. Having a view with this property in the standard library would break
with precedent; the version of the "`to_utf_view`" concept in this paper is an
exposition-only implementation detail for that reason.
But the problem is that, without going through an alias, it's still not possible to invoke
the constructor of `to_utf_view` in a way that activates that deduction guide. Users would
need to explicitly write down the type of the underlying view and the destination
encoding:

However, this issue doesn't apply to the CPOs, so users are still free to
write `generic_string | to_utf<char8_t>`.
```c++
std::u32string transcode_to_utf8(const std::u8string& str) {
return std::ranges::to_utf_view<char32_t, std::ranges::ref_view<const std::u8string>>(str)
| std::ranges::to<std::u32string>();
}
```

Which isn't really viable.

### `to_utfN_view` Views As Thin Wrappers Around an Implementation-Defined `@*to-utf-view-impl*@`

This is the status quo in the current revision: rename `to_utf_view` to
`@*to-utf-view-impl*@` and add separate `to_utf8_view`, `to_utf16_view`, and
`to_utf32_view` classes that each contain a `@*to-utf-view-impl*@` data member.

Although this wording strategy is somewhat novel, it allows us to write down conventional
user-defined deduction guides for each of these views:

```c++
template<class R>
to_utf8_view(R&&) -> to_utf8_view<views::all_t<R>>;
```

# Changelog

Expand Down