7z: Change "data length" cost to reflect "data length penalty" #5919

magnumripper · 2025-12-04T19:05:03Z

Make this the default as none of the others are very useful other than for information.

I'm still fond of the other costs so this had to bump FMT_TUNABLE_COSTS to 5. I have wanted to do so before, I think it can't hurt - even though some "costs" are more important for information than for filtering.

solardiz

I'm unsure this is the right approach, but I don't mind.

solardiz · 2025-12-06T02:04:27Z

src/7z_common_plug.c

+	if (sevenzip_trust_padding)
+		cost >>= (pad_size * 8);
+
+	cost += sevenzip_iteration_count(salt);


This assumes that packed size (sans early reject) and iterations contribute to the total cost equally, but that's probably a great oversimplification. Maybe there should be an empirically tuned coefficient here at least?

I first had it multiplied with iterations and before shifting, which was way worse. Adding iterations after shifting size makes some sense: for many archives, this will correctly end up little more than the iterations count. This all is a simplification anyway because the inflate method matters.

The thing is a huge archive with no or little padding is a huge hit to performance.

Instead of this whole PR, we could change the existing "data length" cost to be a padding-shifted one, and rename it to "effective size after early reject" or something better (not sure what).

I did consider something like 8 + (pad_size * 8) though. Thinking again it's probably better and also increases the 4 GB ceiling. I can run some speed tests.

solardiz · 2025-12-06T02:05:23Z

src/7z_fmt_plug.c

 		MAX_KEYS_PER_CRYPT,
 		FMT_CASE | FMT_8_BIT | FMT_OMP | FMT_UNICODE | FMT_ENC | FMT_DYNA_SALT | FMT_HUGE_INPUT,
 		{
+			"cost",


Maybe call this "estimated cost"? Also in the other format.

Now called "data size penalty" instead, to better reflect this. This is much more usable for actually filtering out hashes: Tens or hundreds of megabytes without a single byte of padding is a huge hit to performance (especially for the GPU format) but as soon as we have a byte or two of padding, the impact very quickly approaches zero.

magnumripper · 2025-12-10T22:09:06Z

Maybe there should be an empirically tuned coefficient here at least?

After some testing I gave up on this. I did find that on CPU, a formula of (data length >> (6 + (8 * pad_size))) + iterations would roughly reflect how slow the attack would be. But there are too many other possible variations, and even worse with the OpenCL format.

So I took a much simpler approach: This commit now merely changes the existing "data length" cost to "data size penalty" by simply decreasing the data size according to how much padding we have. The old cost was informative but useless for filtering - the new one reflects that a 4 GB data size is no problem at all as long as we have at least two bytes of padding for early rejection - while a 50 MB size with no padding will hit performance significantly.

magnumripper requested a review from solardiz December 4, 2025 19:09

magnumripper force-pushed the 7z-real-cost branch from 5871699 to a470579 Compare December 4, 2025 19:12

solardiz approved these changes Dec 6, 2025

View reviewed changes

magnumripper force-pushed the 7z-real-cost branch from a470579 to 9cef6fb Compare December 10, 2025 20:37

magnumripper changed the title ~~7z: Add a combined cost considering data length, iterations and padding.~~ 7z: Change "data length" cost to reflect "data length penalty" Dec 10, 2025

magnumripper force-pushed the 7z-real-cost branch 3 times, most recently from b2ebb40 to 01f2b69 Compare December 10, 2025 21:51

magnumripper force-pushed the 7z-real-cost branch from 01f2b69 to de4e8e4 Compare December 10, 2025 21:53

magnumripper merged commit f9bbdef into openwall:bleeding-jumbo Dec 10, 2025
33 of 34 checks passed

magnumripper deleted the 7z-real-cost branch December 10, 2025 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

7z: Change "data length" cost to reflect "data length penalty" #5919

7z: Change "data length" cost to reflect "data length penalty" #5919

magnumripper commented Dec 4, 2025

Uh oh!

solardiz left a comment

Uh oh!

solardiz Dec 6, 2025

Uh oh!

magnumripper Dec 6, 2025

Uh oh!

magnumripper Dec 6, 2025

Uh oh!

solardiz Dec 6, 2025

Uh oh!

magnumripper commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

7z: Change "data length" cost to reflect "data length penalty" #5919

7z: Change "data length" cost to reflect "data length penalty" #5919

Conversation

magnumripper commented Dec 4, 2025

Uh oh!

solardiz left a comment

Choose a reason for hiding this comment

Uh oh!

solardiz Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

magnumripper Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

magnumripper Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

solardiz Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

magnumripper commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants