New format_args!() and fmt::Arguments implementation #148789

m-ou-se · 2025-11-10T14:30:15Z

This is a new implementation of fmt::Arguments. In this implementation, fmt::Arguments is only two pointers in size. (Instead of six, before.) This makes it the same size as a &str and makes it fit in a register pair.

This fmt::Arguments can store a &'static str without any indirection or additional storage. This means that simple cases like print_fmt(format_args!("hello")) are now just as efficient for the caller as print_str("hello"), as shown by this example:

code:

fn main() {
    println!("Hello, world!");
}

before:

main:
 sub     rsp, 56
 lea     rax, [rip + .Lanon_hello_world]
 mov     qword ptr [rsp + 8], rax
 mov     qword ptr [rsp + 16], 1
 mov     qword ptr [rsp + 24], 8
 xorps   xmm0, xmm0
 movups  xmmword ptr [rsp + 32], xmm0
 lea     rdi, [rsp + 8]
 call    qword ptr [rip + std::io::stdio::_print]
 add     rsp, 56
 ret

after:

main:
 lea     rsi, [rip + .Lanon_hello_world]
 mov     edi, 29
 jmp     qword ptr [rip + std::io::stdio::_print]

(panic!("Hello, world!"); shows a similar change.)

This implementation stores all static information as just a single (byte) string, without any indirection:

code:

format_args!("Hello, {name:-^20}!")

lowering before:

fmt::Arguments::new_v1_formatted(
    &["Hello, ", "!\n"],
    &args,
    &[
        Placeholder {
            position: 0usize,
            flags: 3355443245u32,
            precision: format_count::Implied,
            width: format_count::Is(20u16),
        },
    ],
)

lowering after:

fmt::Arguments::new(
    b"\x07Hello, \x83-\x00\x00\xc8\x14\x00\x02!\n\x00",
    &args,
)

This saves a ton of pointers and simplifies the expansion significantly, but does mean that individual pieces (e.g. "Hello, " and "!\n") cannot be reused. (Those pieces are often smaller than a pointer to them, though, in which case reusing them is useless.)

The details of the new representation are documented in library/core/src/fmt/mod.rs.

m-ou-se · 2025-11-10T14:33:10Z

@bors try @rust-timer queue

Experiment: New fmt::Arguments implementation (another one)

rust-bors · 2025-11-10T16:53:36Z

☀️ Try build successful (CI)
Build commit: 6e6ba94 (6e6ba949d24fbfbd9cd48ca4c98adf59fbd04482, parent: a7b3715826827677ca8769eb88dc8052f43e734b)

rust-timer · 2025-11-10T18:13:06Z

Finished benchmarking commit (6e6ba94): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.1%, 5.8%]	26
Regressions ❌ (secondary)	0.6%	[0.1%, 1.3%]	44
Improvements ✅ (primary)	-0.7%	[-4.3%, -0.1%]	109
Improvements ✅ (secondary)	-1.7%	[-38.2%, -0.0%]	93
All ❌✅ (primary)	-0.5%	[-4.3%, 5.8%]	135

Max RSS (memory usage)

Results (primary -1.5%, secondary -0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.2%	[2.2%, 2.2%]	1
Regressions ❌ (secondary)	3.7%	[1.0%, 6.7%]	12
Improvements ✅ (primary)	-1.6%	[-6.0%, -0.5%]	31
Improvements ✅ (secondary)	-2.6%	[-7.9%, -0.7%]	25
All ❌✅ (primary)	-1.5%	[-6.0%, 2.2%]	32

Cycles

Results (primary -0.5%, secondary -4.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	4.8%	[3.4%, 6.2%]	2
Regressions ❌ (secondary)	8.8%	[2.6%, 18.8%]	6
Improvements ✅ (primary)	-3.1%	[-5.0%, -2.1%]	4
Improvements ✅ (secondary)	-10.3%	[-39.4%, -2.1%]	13
All ❌✅ (primary)	-0.5%	[-5.0%, 6.2%]	6

Binary size

Results (primary -0.7%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 1.4%]	4
Regressions ❌ (secondary)	3.2%	[0.0%, 7.5%]	12
Improvements ✅ (primary)	-0.8%	[-3.3%, -0.0%]	129
Improvements ✅ (secondary)	-1.7%	[-23.6%, -0.0%]	123
All ❌✅ (primary)	-0.7%	[-3.3%, 1.4%]	133

Bootstrap: 476.631s -> 471.922s (-0.99%)
Artifact size: 391.32 MiB -> 388.56 MiB (-0.70%)

m-ou-se · 2025-11-10T18:19:30Z

Ooh that's pretty good :D

m-ou-se · 2025-11-10T19:55:06Z

Pretty much everything looks like a great improvement. Not only number of instructions executed, but also memory usage and binary size. 🎉

Only two significant negative results:

1. "image-0.25.6 opt incr-patched:println" with almost +6% instructions:u.

Looking at the detailed results, it looks like that's all LLVM. Probably because llvm got more optimization opportunities. That's not necessarily a bad thing.

2. The `fmt-write-str` runtime benchmark with over +12% instructions:u.

This could be concerning, but I can't seem to fully replicate it locally.

If I recompile and run this benchmark 100 times in both nightly and with this PR, I do get this interesting result though:

With the nightly compiler, the results vary, with many measurements clustered close to 25ms but also many around 40ms. With this PR, the results are very consistent, all clustered around 27ms. (Update: It's around 26ms now, after a minor optimization.)

So, the median result is worse, but the average is better.

My guess is that the indirection (a slice of string slices) can make things unpredictable, as the strings aren't always in the optimal place for caching. The lack of indirection in the new version then makes it much more predictable. This is just a guess though.

m-ou-se · 2025-11-11T16:36:19Z

@bors try @rust-timer queue

Experiment: New fmt::Arguments implementation (another one)

We don't need it anymore.

nyurik · 2025-11-11T17:49:15Z

just curious - why is there a trailing \x00?

fmt::Arguments::new(
    b"\x07Hello, \x83-\x00\x00\xc8\x14\x00\x02!\n\x00",
    &args,
)

m-ou-se · 2025-11-11T19:12:55Z

just curious - why is there a trailing \x00?

See the documentation in library/core/src/fmt/mod.rs. Without an end marker, we wouldn't know where to stop. We don't store the length of the template, to not waste extra space in fmt::Arguments.

library/core/src/panicking.rs

library/core/src/fmt/mod.rs

jhpratt · 2025-11-11T23:05:21Z

library/core/src/fmt/mod.rs

-    fmt: Option<&'a [rt::Placeholder]>,
+/// Used by the format_args!() macro to create a fmt::Arguments object.
+#[doc(hidden)]
+#[rustc_diagnostic_item = "FmtArgumentsNew"]


Is this a bootstrap thing?

Why "New" then?

It is not new. I just moved it from rt.rs to mod.rs, where the rest of fmt::Arguments lives.

library/core/src/fmt/mod.rs

compiler/rustc_ast_lowering/src/format.rs

m-ou-se · 2025-11-12T02:13:29Z

@rust-timer build 155c5d4

SUPERCILEX · 2025-11-12T02:35:43Z

library/core/src/fmt/mod.rs

+//
+//   The template byte sequence is the concatenation of parts of the following types:
+//
+//   - Literal string piece (1-127 bytes):


Can we use a varint encoding? We lose a bit (hehe) of efficiency on strings 64-127 bytes long as you'd now only be able to represent strings of len 1-63 with a single byte. But I'm imagining cases where somebody might have a very long string like say the GNU license that just slots in a name or something and it's probably sad to be required to splice up that string hundreds of times.

The varint encoding would be:

0x00xx_xxxx: single byte length, 1-63 range

0x010x_xxxx-xxxx_xxxx: 2 bytes, 2^13-1 range

...

0x0111_1110_...: 7 bytes, 2^(6*8)-1 range

0x0111_1111_...: 9 bytes, special in that it adds two bytes from the previous encoding to represent the full u64 range, dunno if that's actually needed.

Another option could be to upgrade each long string piece to a placeholder with a string argument.

I think adding a variable length encoding like you propose adds too much complexity for too little benefit. If 127 byte chunks are a problem, we could consider adding a single encoding for a 16 bit length. Then things get split up into chunks of 64KiB at most, which seems perfectly fine.

Both options seem reasonable, maybe the first one is more robust? It even allows codegen to choose when to put things in the placeholder (e.g. it could decide 4 mini strings is ok but more than that and you go in the placeholder). Or honestly these aren't mutually exclusive, you could do both but I doubt that's necessary.

It'd be interesting to see if large formatted strings are even a thing. Like do we have data to know if this is even a problem?

Both options seem reasonable

I'll check which one is more efficient to implement, and will benchmark if it makes any significant difference.

It'd be interesting to see if large formatted strings are even a thing. Like do we have data to know if this is even a problem?

Good question. We could do a crater run with a rustc that throws an error when it sees a large format string. But I suspect that if it is common, it's more likely in binaries than in libraries, so a crater run will probably not be very representative.

rust-log-analyzer · 2025-11-12T03:42:43Z

The job x86_64-gnu-tools failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

...........................................        (143/143)

======== tests/rustdoc-gui/globals.goml ========

[ERROR] `tests/rustdoc-gui/globals.goml` line 14: The following errors happened: [Property named `"searchIndex"` doesn't exist]: for command `assert-window-property-false: {"searchIndex": null}`

======== tests/rustdoc-gui/search-result-display.goml ========

[WARNING] `tests/rustdoc-gui/search-result-display.goml` line 39: Delta is 0 for "x", maybe try to use `compare-elements-position` instead?

rust-timer · 2025-11-12T03:48:52Z

Finished benchmarking commit (155c5d4): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.1%, 5.7%]	13
Regressions ❌ (secondary)	0.6%	[0.1%, 1.2%]	37
Improvements ✅ (primary)	-0.7%	[-4.4%, -0.1%]	128
Improvements ✅ (secondary)	-1.5%	[-38.0%, -0.0%]	113
All ❌✅ (primary)	-0.5%	[-4.4%, 5.7%]	141

Max RSS (memory usage)

Results (primary -1.9%, secondary -0.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	6.8%	[6.8%, 6.8%]	1
Regressions ❌ (secondary)	3.7%	[0.9%, 7.3%]	15
Improvements ✅ (primary)	-2.2%	[-5.2%, -0.6%]	26
Improvements ✅ (secondary)	-2.9%	[-8.1%, -0.6%]	27
All ❌✅ (primary)	-1.9%	[-5.2%, 6.8%]	27

Cycles

Results (primary -1.3%, secondary -4.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	4.9%	[2.9%, 7.0%]	2
Regressions ❌ (secondary)	5.3%	[2.3%, 8.1%]	6
Improvements ✅ (primary)	-3.1%	[-5.8%, -2.1%]	7
Improvements ✅ (secondary)	-8.0%	[-39.6%, -1.7%]	20
All ❌✅ (primary)	-1.3%	[-5.8%, 7.0%]	9

Binary size

Results (primary -0.8%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 1.4%]	5
Regressions ❌ (secondary)	3.0%	[0.0%, 6.9%]	12
Improvements ✅ (primary)	-0.8%	[-3.3%, -0.0%]	128
Improvements ✅ (secondary)	-1.7%	[-23.6%, -0.0%]	123
All ❌✅ (primary)	-0.8%	[-3.3%, 1.4%]	133

Bootstrap: 477.625s -> 473.059s (-0.96%)
Artifact size: 391.41 MiB -> 388.66 MiB (-0.70%)

bors · 2025-11-12T04:41:52Z

☔ The latest upstream changes (presumably #148851) made this pull request unmergeable. Please resolve the merge conflicts.

m-ou-se self-assigned this Nov 10, 2025

m-ou-se added the A-fmt Area: `core::fmt` label Nov 10, 2025

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Nov 10, 2025

Auto merge of #148789 - m-ou-se:new-fmt-args-alt, r=<try>

6e6ba94

Experiment: New fmt::Arguments implementation (another one)

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2025

m-ou-se mentioned this pull request Nov 10, 2025

Tracking issue for improving std::fmt::Arguments and format_args!() #99012

Open

57 tasks

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 10, 2025

m-ou-se mentioned this pull request Nov 11, 2025

Experiment: New fmt::Arguments implementation #148529

Closed

m-ou-se force-pushed the new-fmt-args-alt branch from 9f41692 to 349d2b5 Compare November 11, 2025 15:02

rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label Nov 11, 2025

Expose expr_unsafe in LoweringContext.

fbb91ff

m-ou-se force-pushed the new-fmt-args-alt branch from 349d2b5 to 5b58c66 Compare November 11, 2025 16:35

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Nov 11, 2025

Auto merge of #148789 - m-ou-se:new-fmt-args-alt, r=<try>

155c5d4

Experiment: New fmt::Arguments implementation (another one)

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 11, 2025

m-ou-se added 5 commits November 11, 2025 18:14

Bless tests.

accdb8c

Make clippy happy.

4a8a70f

Remove the 'always set' bit from FormattingOptions.

11c5137

We don't need it anymore.

Don't encode zero width or precision in fmt string.

6958580

Document fmt::Arguments internal representation.

d7ccb45

m-ou-se force-pushed the new-fmt-args-alt branch from 6022b3b to d7ccb45 Compare November 11, 2025 17:14

m-ou-se removed A-run-make Area: port run-make Makefiles to rmake.rs T-clippy Relevant to the Clippy team. labels Nov 11, 2025

This comment has been minimized.

Sign in to view

Bless clippy tests.

8080521

rustbot added A-run-make Area: port run-make Makefiles to rmake.rs T-clippy Relevant to the Clippy team. labels Nov 11, 2025

This comment was marked as outdated.

Sign in to view

m-ou-se removed A-run-make Area: port run-make Makefiles to rmake.rs T-clippy Relevant to the Clippy team. labels Nov 11, 2025

jhpratt mentioned this pull request Nov 11, 2025

Optimize panic!(<str lit>) into method call #148375

Closed

jhpratt reviewed Nov 11, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

Clarify internal fmt::Arguments documentation.

40e16c2

rustbot added A-run-make Area: port run-make Makefiles to rmake.rs T-clippy Relevant to the Clippy team. labels Nov 12, 2025

Change assert to span_err.

dc7f0eb

SUPERCILEX reviewed Nov 12, 2025

View reviewed changes

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 12, 2025

New format_args!() and fmt::Arguments implementation #148789

Are you sure you want to change the base?

New format_args!() and fmt::Arguments implementation #148789

Conversation

m-ou-se commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-ou-se commented Nov 10, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Nov 10, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Nov 10, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

m-ou-se commented Nov 10, 2025

Uh oh!

m-ou-se commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. "image-0.25.6 opt incr-patched:println" with almost +6% instructions:u.

2. The fmt-write-str runtime benchmark with over +12% instructions:u.

Uh oh!

m-ou-se commented Nov 11, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

nyurik commented Nov 11, 2025

Uh oh!

This comment has been minimized.

m-ou-se commented Nov 11, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

m-ou-se commented Nov 12, 2025

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

m-ou-se Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rust-log-analyzer commented Nov 12, 2025

Uh oh!

rust-timer commented Nov 12, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

bors commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

m-ou-se commented Nov 10, 2025 •

edited

Loading

m-ou-se commented Nov 10, 2025 •

edited

Loading

2. The `fmt-write-str` runtime benchmark with over +12% instructions:u.

m-ou-se Nov 12, 2025 •

edited

Loading