Optimize pack() #18524

nielsdos · 2025-05-08T21:14:25Z

Instead of using lookup tables, we can use a combination of shifts and
byte swapping to achieve the same thing in less cycles and with less
code.

Benchmark files

pack1.php:

for ($i = 0; $i < 10_000_000; ++$i) {
    pack("J", 0x7FFFFFFFFFFFFFFF);
}

pack2.php:

for ($i = 0; $i < 4000000; ++$i) {
    pack("nvc*", 0x1234, 0x5678, 65, 66);
}

Results

On an i7-4790:

Benchmark 1: ./sapi/cli/php pack1.php
  Time (mean ± σ):     408.8 ms ±   3.4 ms    [User: 406.1 ms, System: 1.6 ms]
  Range (min … max):   403.6 ms … 413.6 ms    10 runs

Benchmark 2: ./sapi/cli/php_old pack1.php
  Time (mean ± σ):     451.7 ms ±   7.7 ms    [User: 448.5 ms, System: 2.0 ms]
  Range (min … max):   442.8 ms … 461.2 ms    10 runs

Summary
  ./sapi/cli/php pack1.php ran
    1.11 ± 0.02 times faster than ./sapi/cli/php_old pack1.php

Benchmark 1: ./sapi/cli/php pack2.php
  Time (mean ± σ):     239.3 ms ±   6.0 ms    [User: 236.2 ms, System: 2.3 ms]
  Range (min … max):   233.2 ms … 256.8 ms    12 runs

Benchmark 2: ./sapi/cli/php_old pack2.php
  Time (mean ± σ):     271.9 ms ±   3.3 ms    [User: 269.7 ms, System: 1.3 ms]
  Range (min … max):   267.4 ms … 279.0 ms    11 runs

Summary
  ./sapi/cli/php pack2.php ran
    1.14 ± 0.03 times faster than ./sapi/cli/php_old pack2.php

On an i7-1185G7:

Benchmark 1: ./sapi/cli/php pack1.php
  Time (mean ± σ):     263.7 ms ±   1.8 ms    [User: 262.6 ms, System: 0.9 ms]
  Range (min … max):   261.5 ms … 268.2 ms    11 runs

Benchmark 2: ./sapi/cli/php_old pack1.php
  Time (mean ± σ):     303.3 ms ±   6.5 ms    [User: 300.7 ms, System: 2.3 ms]
  Range (min … max):   297.4 ms … 318.1 ms    10 runs

Summary
  ./sapi/cli/php pack1.php ran
    1.15 ± 0.03 times faster than ./sapi/cli/php_old pack1.php

Benchmark 1: ./sapi/cli/php pack2.php
  Time (mean ± σ):     156.7 ms ±   2.9 ms    [User: 154.7 ms, System: 1.7 ms]
  Range (min … max):   151.6 ms … 164.7 ms    19 runs

Benchmark 2: ./sapi/cli/php_old pack2.php
  Time (mean ± σ):     174.6 ms ±   3.3 ms    [User: 171.9 ms, System: 2.3 ms]
  Range (min … max):   170.7 ms … 180.4 ms    17 runs

Summary
  ./sapi/cli/php pack2.php ran
    1.11 ± 0.03 times faster than ./sapi/cli/php_old pack2.php

ext/standard/pack.c

staabm · 2025-05-21T09:20:03Z

maybe a interessting benchmark/test-case for unpack: stomp-php/stomp-php#184

<?php
$file = file_get_contents('FILE');
echo count(unpack('C*', $file)) . "\n";

vs.

<?php
$file = file_get_contents('FILE');
echo strlen($file) . "\n";

using truncate -s 80M FILE.

the strlen() variant is a lot faster

nielsdos · 2025-06-08T11:38:47Z

maybe a interessting benchmark/test-case for unpack: stomp-php/stomp-php#184
<?php
$file = file_get_contents('FILE');
echo count(unpack('C*', $file)) . "\n";
vs.
<?php
$file = file_get_contents('FILE');
echo strlen($file) . "\n";
using truncate -s 80M FILE.

the strlen() variant is a lot faster

Unpack will always be slower than just strlen. However, your code revealed that repetitions were handled in a slow way where lots of temporary strings were created and then parsed. I opened a PR to fix that particular issue: #18803

divinity76 · 2025-06-10T19:28:36Z

could benchmark

static inline void php_pack(const zval *val, size_t size,
                            php_pack_endianness enc, char *out)
{
    zend_long z = zval_get_long(val);

    if ((enc == PHP_LITTLE_ENDIAN) != MACHINE_LITTLE_ENDIAN) {
        z = PHP_LONG_BSWAP(z);
    }
    memcpy(out, (char*)&z + sizeof(z) - size, size);
}

might be faster

nielsdos · 2025-06-10T19:44:28Z

Very strangely, my original code with zend_never_inline is slightly faster than master, but your code without zend_never_inline seems to beat that in my test with the 'J' specifier. Testing some more stuff...

divinity76 · 2025-06-10T20:10:58Z

if the performance difference insignificant/marginal, as in hardly even benchmark-able, i would recommend just ignoring it.

I like how this makes pack the code much simpler (assuming it actually works on BE)

nielsdos · 2025-06-10T20:16:42Z

I think I managed to make the compiler happy and let it make good inlining decisions while keeping the code simple.

For example for this:

for ($i = 0; $i < 10_000_000; ++$i) {
  pack("J", 0x7FFFFFFFFFFFFFFF);
}

On an i7-4790:

Benchmark 1: ./sapi/cli/php pack.php
  Time (mean ± σ):     408.8 ms ±   3.4 ms    [User: 406.1 ms, System: 1.6 ms]
  Range (min … max):   403.6 ms … 413.6 ms    10 runs
 
Benchmark 2: ./sapi/cli/php_old pack.php
  Time (mean ± σ):     451.7 ms ±   7.7 ms    [User: 448.5 ms, System: 2.0 ms]
  Range (min … max):   442.8 ms … 461.2 ms    10 runs
 
Summary
  ./sapi/cli/php pack.php ran
    1.11 ± 0.02 times faster than ./sapi/cli/php_old pack.php

And for this:

for ($i=0;$i<4000000;$i++)
pack("nvc*", 0x1234, 0x5678, 65, 66);

On the same machine:

Benchmark 1: ./sapi/cli/php pack.php
  Time (mean ± σ):     239.3 ms ±   6.0 ms    [User: 236.2 ms, System: 2.3 ms]
  Range (min … max):   233.2 ms … 256.8 ms    12 runs
 
Benchmark 2: ./sapi/cli/php_old pack.php
  Time (mean ± σ):     271.9 ms ±   3.3 ms    [User: 269.7 ms, System: 1.3 ms]
  Range (min … max):   267.4 ms … 279.0 ms    11 runs
 
Summary
  ./sapi/cli/php pack.php ran
    1.14 ± 0.03 times faster than ./sapi/cli/php_old pack.php

Let's hope it's reproducible

nielsdos · 2025-06-11T19:02:25Z

It's consistent on my laptop. Squashed the commits and updated the descriptions.

Instead of using lookup tables, we can use a combination of shifts and byte swapping to achieve the same thing in less cycles and with less code. Benchmark files --------------- pack1.php: ```php for ($i = 0; $i < 10_000_000; ++$i) { pack("J", 0x7FFFFFFFFFFFFFFF); } ``` pack2.php: ```php for ($i = 0; $i < 4000000; ++$i) { pack("nvc*", 0x1234, 0x5678, 65, 66); } ``` On an i7-4790: ``` Benchmark 1: ./sapi/cli/php pack1.php Time (mean ± σ): 408.8 ms ± 3.4 ms [User: 406.1 ms, System: 1.6 ms] Range (min … max): 403.6 ms … 413.6 ms 10 runs Benchmark 2: ./sapi/cli/php_old pack1.php Time (mean ± σ): 451.7 ms ± 7.7 ms [User: 448.5 ms, System: 2.0 ms] Range (min … max): 442.8 ms … 461.2 ms 10 runs Summary ./sapi/cli/php pack1.php ran 1.11 ± 0.02 times faster than ./sapi/cli/php_old pack1.php Benchmark 1: ./sapi/cli/php pack2.php Time (mean ± σ): 239.3 ms ± 6.0 ms [User: 236.2 ms, System: 2.3 ms] Range (min … max): 233.2 ms … 256.8 ms 12 runs Benchmark 2: ./sapi/cli/php_old pack2.php Time (mean ± σ): 271.9 ms ± 3.3 ms [User: 269.7 ms, System: 1.3 ms] Range (min … max): 267.4 ms … 279.0 ms 11 runs Summary ./sapi/cli/php pack2.php ran 1.14 ± 0.03 times faster than ./sapi/cli/php_old pack2.php ``` On an i7-1185G7: ``` Benchmark 1: ./sapi/cli/php pack1.php Time (mean ± σ): 263.7 ms ± 1.8 ms [User: 262.6 ms, System: 0.9 ms] Range (min … max): 261.5 ms … 268.2 ms 11 runs Benchmark 2: ./sapi/cli/php_old pack1.php Time (mean ± σ): 303.3 ms ± 6.5 ms [User: 300.7 ms, System: 2.3 ms] Range (min … max): 297.4 ms … 318.1 ms 10 runs Summary ./sapi/cli/php pack1.php ran 1.15 ± 0.03 times faster than ./sapi/cli/php_old pack1.php Benchmark 1: ./sapi/cli/php pack2.php Time (mean ± σ): 156.7 ms ± 2.9 ms [User: 154.7 ms, System: 1.7 ms] Range (min … max): 151.6 ms … 164.7 ms 19 runs Benchmark 2: ./sapi/cli/php_old pack2.php Time (mean ± σ): 174.6 ms ± 3.3 ms [User: 171.9 ms, System: 2.3 ms] Range (min … max): 170.7 ms … 180.4 ms 17 runs Summary ./sapi/cli/php pack2.php ran 1.11 ± 0.03 times faster than ./sapi/cli/php_old pack2.php ``` Co-authored-by: [email protected]

nielsdos · 2025-06-12T17:25:10Z

Rebased to solve conflict

github-actions bot added Extension: standard ABI break labels May 8, 2025

TimWolla reviewed May 9, 2025

View reviewed changes

ext/standard/pack.c Outdated Show resolved Hide resolved

nielsdos force-pushed the opt-pack branch from e855c99 to fafcaa3 Compare May 18, 2025 18:34

nielsdos mentioned this pull request Jun 10, 2025

optimize pack #18513

Open

nielsdos force-pushed the opt-pack branch 2 times, most recently from 47a5320 to 3b1918b Compare June 11, 2025 19:01

nielsdos changed the title ~~[WIP] Optimize pack()~~ Optimize pack() Jun 11, 2025

nielsdos marked this pull request as ready for review June 11, 2025 19:29

nielsdos requested a review from bukka as a code owner June 11, 2025 19:29

nielsdos added 2 commits June 12, 2025 19:24

Use ZEND_BYTES_SWAP32() for php_pack_reverse_int32()

18a385f

nielsdos force-pushed the opt-pack branch from 3b1918b to 18a385f Compare June 12, 2025 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize pack() #18524

Optimize pack() #18524

nielsdos commented May 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

staabm commented May 21, 2025 •

edited

Loading

Uh oh!

nielsdos commented Jun 8, 2025

Uh oh!

divinity76 commented Jun 10, 2025 •

edited

Loading

Uh oh!

nielsdos commented Jun 10, 2025

Uh oh!

divinity76 commented Jun 10, 2025

Uh oh!

nielsdos commented Jun 10, 2025

Uh oh!

nielsdos commented Jun 11, 2025

Uh oh!

nielsdos commented Jun 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Optimize pack() #18524

Are you sure you want to change the base?

Optimize pack() #18524

Conversation

nielsdos commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark files

Results

Uh oh!

Uh oh!

staabm commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nielsdos commented Jun 8, 2025

Uh oh!

divinity76 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nielsdos commented Jun 10, 2025

Uh oh!

divinity76 commented Jun 10, 2025

Uh oh!

nielsdos commented Jun 10, 2025

Uh oh!

nielsdos commented Jun 11, 2025

Uh oh!

nielsdos commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nielsdos commented May 8, 2025 •

edited

Loading

staabm commented May 21, 2025 •

edited

Loading

divinity76 commented Jun 10, 2025 •

edited

Loading

nielsdos commented Jun 12, 2025 •

edited

Loading