Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 1 addition & 28 deletions include/beman/utf_view/to_utf_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,29 +127,6 @@ class to_utf_view : public std::ranges::view_interface<to_utf_view<V, E, ToType>

V base_ = V(); // @*exposition only*@

/* !PAPER */
template <typename V2>
struct maybe_cache_begin_impl {
constexpr auto begin_impl(to_utf_view& self, auto base) noexcept {
return exposition_only_iterator<false>{self, std::move(base)};
}
};

template <typename V2>
requires (!std::same_as<std::ranges::range_value_t<V2>, char32_t> && std::copy_constructible<V>)
struct maybe_cache_begin_impl<V2> {
constexpr auto begin_impl(to_utf_view& self, auto base) noexcept {
if (!cache_) {
cache_.emplace(self, std::move(base));
}
return *cache_;
}
std::optional<exposition_only_iterator<false>> cache_;
};

[[no_unique_address]] maybe_cache_begin_impl<V> cache_;

/* PAPER */
public:
constexpr to_utf_view()
requires std::default_initializable<V>
Expand All @@ -173,11 +150,7 @@ class to_utf_view : public std::ranges::view_interface<to_utf_view<V, E, ToType>
/* PAPER: constexpr @*iterator*@<false> begin(); */
constexpr exposition_only_iterator<false> begin()
{
if (!std::copy_constructible<V>) {
return exposition_only_iterator<false>(*this, std::ranges::begin(base_));
} else {
return cache_.begin_impl(*this, std::ranges::begin(base_));
}
return exposition_only_iterator<false>(*this, std::ranges::begin(base_));
}
/* PAPER */
constexpr exposition_only_iterator<true> begin() const
Expand Down
28 changes: 13 additions & 15 deletions paper/P2728.md
Original file line number Diff line number Diff line change
Expand Up @@ -589,8 +589,6 @@ constexpr @*iterator*@ begin();

_Returns:_ `{*this, std::ranges::begin(base_)}`

_Remarks:_ In order to provide the amortized constant time complexity required by the `range` concept when `to_utf_view` transcodes from UTF-8 or UTF-16, this function caches the result within the `to_utf_view` for use on subsequent calls.

```cpp
constexpr auto reserve_hint() requires approximately_sized_range<V>;
```
Expand Down Expand Up @@ -999,22 +997,21 @@ You can also substitute a different replacement character by changing the
result of the `else` clause, or add exception-based error handling by throwing
at that point.

## Caching `begin()`

When we transcode from UTF-8, invoking `begin()` on the transcoding view may read up to
four elements from the underlying view. Similarly, when transcoding from UTF-16, it may
read up to two underlying elements. As a result, in order to preserve "amortized O(1)
complexity," we need to cache `begin()` in these situations.
## Why We Don't Cache `begin()`

On the other hand, when transcoding from UTF-32, `begin()` reads at most one element from
the underlying view, so `begin()` is not cached when doing so.
When we invoke `begin()`, constructing the transcoding iterator may read up to four
elements from the underlying view if it's transcoding from UTF-8. A previous revision of
this paper implemented `begin()` caching, based on the idea that iterating the underlying
range could have unbounded compexity.

We also incorporate [@P3725R1]'s approach, in that when a transcoding view wraps a
const-iterable `input_range` that is not a `forward_range`, the transcoding view provides
a `const` overload of `begin()` that is non-caching.
However, Tim Song pointed to the wording in `[iterator.requirements.general]` stating that
"All the categories of iterators require only those functions that are realizable for a
given category in constant time (amortized)." This means that we should be making the
assumption that the underlying iterator operations used by `begin()` are "amortized
constant time" (in a hand-wavey sense). Tim also pointed out that transcoding from UTF is
equivalent to `views::adjacent<4>`, which doesn't cache.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting. Does this mean that to_utfN will again be borrowable?


However, we do not provide [@P3725R1]-style `views::input_to_utf8` CPOs; for that use
case, we expect users to simply spell `views::to_input | views::to_utf8`.
Based on this reasoning, the transcoding views don't cache `begin()`.

## Optimizing for Double-Transcoding

Expand Down Expand Up @@ -1205,6 +1202,7 @@ These concepts are true when the type in question is the iterator/sentinel of a
- Remove section stating that the concept of a CPO template is a "novelty"
after it was pointed out in Kona 2025 that it has precedent in
`views::adjacent_transform<N>` and elsewhere
- Don't cache `begin()`

## Changes since R8

Expand Down