Skip to content

Conversation

@zachs18
Copy link
Contributor

@zachs18 zachs18 commented Nov 9, 2025

These are probably not super useful optimizations, but they make it so that vec![expr; LARGE_LENGTH] has better performance for some exprs, e.g.

  • array of length zero in debug mode
  • tuple containing () and zero-valued integers in debug and release mode
  • array of () or other zero-sized IsZero type in debug mode
very rough benchmarks
use std::time::Instant;
use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};

struct NonCopyZst;
static COUNTER: AtomicUsize = AtomicUsize::new(0);

impl Clone for NonCopyZst {
    fn clone(&self) -> Self {
        COUNTER.fetch_add(1, Relaxed);
        Self
    }
}


macro_rules! timeit {
    ($e:expr) => {
        let start = Instant::now();
        _ = $e;
        println!("{:56}: {:?}", stringify!($e), start.elapsed());
    };
}

fn main() {
    timeit!(vec![[String::from("hello"); 0]; 1_000_000_000]); // gets a lot better in debug mode
    timeit!(vec![(0u8, (), 0u16); 1_000_000_000]); // gets a lot better in debug *and* release mode
    timeit!(vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]); // gets a lot better in debug mode
    timeit!(vec![[NonCopyZst; 0]; 1_000_000_000]); // gets a lot better in debug mode
    timeit!(vec![[[1u8; 0]; 1_000_000]; 1_000_000]); // gets a little bit better in debug mode
    timeit!(vec![[[(); 37]; 1_000_000]; 1_000_000]); // gets a little bit better in debug mode
    timeit!(vec![[[1u128; 0]; 1_000_000]; 1_000_000]); // gets a little bit better in debug mode

    // check that we don't regress existing optimizations
    timeit!(vec![(0u8, 0u16); 1_000_000_000]); // about the same time
    timeit!(vec![0u32; 1_000_000_000]); // about the same time

    // check that we still call clone for non-IsZero ZSTs
    timeit!(vec![[const { NonCopyZst }; 2]; 1_000]); // about the same time
    assert_eq!(COUNTER.load(Relaxed), 1998);
    timeit!(vec![NonCopyZst; 10_000]); // about the same time
    assert_eq!(COUNTER.load(Relaxed), 1998 + 9_999);
}
$ cargo +nightly run
// ...
vec![[String::from("hello"); 0]; 1_000_000_000]         : 11.13999724s
vec![(0u8, (), 0u16); 1_000_000_000]                    : 5.254646651s
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 2.738062531s
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 9.483690922s
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 2.919236ms
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 2.927755ms
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 2.931486ms
vec![(0u8, 0u16); 1_000_000_000]                        : 19.46µs
vec![0u32; 1_000_000_000]                               : 9.34µs
vec![[const { NonCopyZst }; 2]; 1_000]                  : 31.88µs
vec![NonCopyZst; 10_000]                                : 36.519µs
$ cargo +dev run
// ...
vec![[String::from("hello"); 0]; 1_000_000_000]         : 4.12µs
vec![(0u8, (), 0u16); 1_000_000_000]                    : 16.299µs
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 210ns
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 210ns
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 170ns
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 110ns
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 140ns
vec![(0u8, 0u16); 1_000_000_000]                        : 11.56µs
vec![0u32; 1_000_000_000]                               : 10.71µs
vec![[const { NonCopyZst }; 2]; 1_000]                  : 36.08µs
vec![NonCopyZst; 10_000]                                : 73.21µs

(checking release mode to make sure this doesn't regress perf there)

$ cargo +nightly run --release
// ...
vec![[String::from("hello"); 0]; 1_000_000_000]         : 70ns
vec![(0u8, (), 0u16); 1_000_000_000]                    : 1.269457501s
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 10ns
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 20ns
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 10ns
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 20ns
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 20ns
vec![(0u8, 0u16); 1_000_000_000]                        : 20ns
vec![0u32; 1_000_000_000]                               : 20ns
vec![[const { NonCopyZst }; 2]; 1_000]                  : 2.66µs
vec![NonCopyZst; 10_000]                                : 13.39µs
$ cargo +dev run --release
vec![[String::from("hello"); 0]; 1_000_000_000]         : 90ns
vec![(0u8, (), 0u16); 1_000_000_000]                    : 30ns
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 20ns
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 30ns
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 20ns
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 20ns
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 20ns
vec![(0u8, 0u16); 1_000_000_000]                        : 30ns
vec![0u32; 1_000_000_000]                               : 20ns
vec![[const { NonCopyZst }; 2]; 1_000]                  : 3.52µs
vec![NonCopyZst; 10_000]                                : 17.13µs

The specific expression I ran into a perf issue that this PR addresses is vec![[(); LARGE]; LARGE], as I was trying to demonstrate Vec::into_flattened panicking on length overflow in the playground, but got a timeout error instead since vec![[(); LARGE]; LARGE] took so long to run in debug mode (it runs fine on the playground in release mode)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 9, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 9, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Comment on lines 66 to 68
// We could probably just return `true` here, since implementing
// `IsZero` for a zero-sized type such that `self.is_zero()` returns
// `false` would be useless, but to be safe we check anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too at first, but that would conflict with the above implementation – any [T; N] now implements IsZero, so e.g. [NonTrivialCloneButZST; 5] would hit this code path but mustn't be zero-initialised.

Copy link
Contributor Author

@zachs18 zachs18 Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the specialization where T: IsZero, so this would not be run for T = NonTrivialCloneButZST, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean, if T = [NonTrivialCloneButZST; 5], it would be wrong to return true unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to explain why we can't just return true unconditionally.

@rustbot ready

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 9, 2025
Implement IsZero for ().

Implement default `IsZero` for all arrays, only returning true if the array is empty
(making the existing array impl for `IsZero` elements a specialization).

Optimize `IsZero::is_zero` for arrays of zero-sized `IsZero` elements.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 9, 2025
@joboet
Copy link
Member

joboet commented Nov 10, 2025

Thanks!
@bors r+ rollup=never

@bors
Copy link
Collaborator

bors commented Nov 10, 2025

📌 Commit 0aaa3ae has been approved by joboet

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Nov 10, 2025

🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2025
};
}

impl_is_zero!((), |_: ()| true); // It is needed to impl for arrays and tuples of ().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw this can just be |()| rather than |_: ()|

@bors
Copy link
Collaborator

bors commented Nov 11, 2025

⌛ Testing commit 0aaa3ae with merge c8f22ca...

@bors
Copy link
Collaborator

bors commented Nov 11, 2025

☀️ Test successful - checks-actions
Approved by: joboet
Pushing c8f22ca to main...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 11, 2025
@bors bors merged commit c8f22ca into rust-lang:main Nov 11, 2025
12 checks passed
@rustbot rustbot added this to the 1.93.0 milestone Nov 11, 2025
@github-actions
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 29a6971 (parent) -> c8f22ca (this PR)

Test differences

Show 4 test diffs

4 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard c8f22ca269a1f2653ac962fe2bc21105065fd6cd --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-apple: 5794.6s -> 7723.5s (+33.3%)
  2. dist-x86_64-apple: 8583.0s -> 10939.3s (+27.5%)
  3. aarch64-apple: 9396.5s -> 7079.7s (-24.7%)
  4. dist-powerpc-linux: 4875.5s -> 5715.9s (+17.2%)
  5. dist-various-1: 3748.5s -> 4189.4s (+11.8%)
  6. x86_64-rust-for-linux: 3046.0s -> 2760.5s (-9.4%)
  7. x86_64-gnu-nopt: 6864.0s -> 7432.5s (+8.3%)
  8. aarch64-gnu-llvm-20-1: 3862.6s -> 3552.7s (-8.0%)
  9. dist-aarch64-msvc: 5636.3s -> 6075.9s (+7.8%)
  10. x86_64-gnu-miri: 4975.2s -> 4598.3s (-7.6%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c8f22ca): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.9% [-0.9%, -0.9%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.4% [4.4%, 4.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.3% [-4.0%, -0.5%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-4.0%, 4.4%] 3

Cycles

Results (secondary -2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.1%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.1%, 0.1%] 5

Bootstrap: 477.532s -> 476.181s (-0.28%)
Artifact size: 391.38 MiB -> 391.36 MiB (-0.00%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants