Skip to content

Optimize pack() #18524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Optimize pack() #18524

wants to merge 2 commits into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented May 8, 2025

Instead of using lookup tables, we can use a combination of shifts and
byte swapping to achieve the same thing in less cycles and with less
code.

Benchmark files

pack1.php:

for ($i = 0; $i < 10_000_000; ++$i) {
    pack("J", 0x7FFFFFFFFFFFFFFF);
}

pack2.php:

for ($i = 0; $i < 4000000; ++$i) {
    pack("nvc*", 0x1234, 0x5678, 65, 66);
}

Results

On an i7-4790:

Benchmark 1: ./sapi/cli/php pack1.php
  Time (mean ± σ):     408.8 ms ±   3.4 ms    [User: 406.1 ms, System: 1.6 ms]
  Range (min … max):   403.6 ms … 413.6 ms    10 runs

Benchmark 2: ./sapi/cli/php_old pack1.php
  Time (mean ± σ):     451.7 ms ±   7.7 ms    [User: 448.5 ms, System: 2.0 ms]
  Range (min … max):   442.8 ms … 461.2 ms    10 runs

Summary
  ./sapi/cli/php pack1.php ran
    1.11 ± 0.02 times faster than ./sapi/cli/php_old pack1.php

Benchmark 1: ./sapi/cli/php pack2.php
  Time (mean ± σ):     239.3 ms ±   6.0 ms    [User: 236.2 ms, System: 2.3 ms]
  Range (min … max):   233.2 ms … 256.8 ms    12 runs

Benchmark 2: ./sapi/cli/php_old pack2.php
  Time (mean ± σ):     271.9 ms ±   3.3 ms    [User: 269.7 ms, System: 1.3 ms]
  Range (min … max):   267.4 ms … 279.0 ms    11 runs

Summary
  ./sapi/cli/php pack2.php ran
    1.14 ± 0.03 times faster than ./sapi/cli/php_old pack2.php

On an i7-1185G7:

Benchmark 1: ./sapi/cli/php pack1.php
  Time (mean ± σ):     263.7 ms ±   1.8 ms    [User: 262.6 ms, System: 0.9 ms]
  Range (min … max):   261.5 ms … 268.2 ms    11 runs

Benchmark 2: ./sapi/cli/php_old pack1.php
  Time (mean ± σ):     303.3 ms ±   6.5 ms    [User: 300.7 ms, System: 2.3 ms]
  Range (min … max):   297.4 ms … 318.1 ms    10 runs

Summary
  ./sapi/cli/php pack1.php ran
    1.15 ± 0.03 times faster than ./sapi/cli/php_old pack1.php

Benchmark 1: ./sapi/cli/php pack2.php
  Time (mean ± σ):     156.7 ms ±   2.9 ms    [User: 154.7 ms, System: 1.7 ms]
  Range (min … max):   151.6 ms … 164.7 ms    19 runs

Benchmark 2: ./sapi/cli/php_old pack2.php
  Time (mean ± σ):     174.6 ms ±   3.3 ms    [User: 171.9 ms, System: 2.3 ms]
  Range (min … max):   170.7 ms … 180.4 ms    17 runs

Summary
  ./sapi/cli/php pack2.php ran
    1.11 ± 0.03 times faster than ./sapi/cli/php_old pack2.php

@staabm
Copy link
Contributor

staabm commented May 21, 2025

maybe a interessting benchmark/test-case for unpack: stomp-php/stomp-php#184

<?php
$file = file_get_contents('FILE');
echo count(unpack('C*', $file)) . "\n";

vs.

<?php
$file = file_get_contents('FILE');
echo strlen($file) . "\n";

using truncate -s 80M FILE.

the strlen() variant is a lot faster

@nielsdos
Copy link
Member Author

nielsdos commented Jun 8, 2025

maybe a interessting benchmark/test-case for unpack: stomp-php/stomp-php#184

<?php
$file = file_get_contents('FILE');
echo count(unpack('C*', $file)) . "\n";

vs.

<?php
$file = file_get_contents('FILE');
echo strlen($file) . "\n";

using truncate -s 80M FILE.

the strlen() variant is a lot faster

Unpack will always be slower than just strlen. However, your code revealed that repetitions were handled in a slow way where lots of temporary strings were created and then parsed. I opened a PR to fix that particular issue: #18803

@nielsdos nielsdos mentioned this pull request Jun 10, 2025
@divinity76
Copy link
Contributor

divinity76 commented Jun 10, 2025

could benchmark

static inline void php_pack(const zval *val, size_t size,
                            php_pack_endianness enc, char *out)
{
    zend_long z = zval_get_long(val);

    if ((enc == PHP_LITTLE_ENDIAN) != MACHINE_LITTLE_ENDIAN) {
        z = PHP_LONG_BSWAP(z);
    }
    memcpy(out, (char*)&z + sizeof(z) - size, size);
}

might be faster

@nielsdos
Copy link
Member Author

Very strangely, my original code with zend_never_inline is slightly faster than master, but your code without zend_never_inline seems to beat that in my test with the 'J' specifier. Testing some more stuff...

@divinity76
Copy link
Contributor

if the performance difference insignificant/marginal, as in hardly even benchmark-able, i would recommend just ignoring it.

I like how this makes pack the code much simpler (assuming it actually works on BE)

@nielsdos
Copy link
Member Author

I think I managed to make the compiler happy and let it make good inlining decisions while keeping the code simple.

For example for this:

for ($i = 0; $i < 10_000_000; ++$i) {
  pack("J", 0x7FFFFFFFFFFFFFFF);
}

On an i7-4790:

Benchmark 1: ./sapi/cli/php pack.php
  Time (mean ± σ):     408.8 ms ±   3.4 ms    [User: 406.1 ms, System: 1.6 ms]
  Range (min … max):   403.6 ms … 413.6 ms    10 runs
 
Benchmark 2: ./sapi/cli/php_old pack.php
  Time (mean ± σ):     451.7 ms ±   7.7 ms    [User: 448.5 ms, System: 2.0 ms]
  Range (min … max):   442.8 ms … 461.2 ms    10 runs
 
Summary
  ./sapi/cli/php pack.php ran
    1.11 ± 0.02 times faster than ./sapi/cli/php_old pack.php

And for this:

for ($i=0;$i<4000000;$i++)
pack("nvc*", 0x1234, 0x5678, 65, 66);

On the same machine:

Benchmark 1: ./sapi/cli/php pack.php
  Time (mean ± σ):     239.3 ms ±   6.0 ms    [User: 236.2 ms, System: 2.3 ms]
  Range (min … max):   233.2 ms … 256.8 ms    12 runs
 
Benchmark 2: ./sapi/cli/php_old pack.php
  Time (mean ± σ):     271.9 ms ±   3.3 ms    [User: 269.7 ms, System: 1.3 ms]
  Range (min … max):   267.4 ms … 279.0 ms    11 runs
 
Summary
  ./sapi/cli/php pack.php ran
    1.14 ± 0.03 times faster than ./sapi/cli/php_old pack.php

Let's hope it's reproducible

nielsdos added 2 commits June 11, 2025 20:47
Instead of using lookup tables, we can use a combination of shifts and
byte swapping to achieve the same thing in less cycles and with less
code.

Benchmark files
---------------

pack1.php:
```php
for ($i = 0; $i < 10_000_000; ++$i) {
    pack("J", 0x7FFFFFFFFFFFFFFF);
}
```

pack2.php:
```php
for ($i = 0; $i < 4000000; ++$i) {
    pack("nvc*", 0x1234, 0x5678, 65, 66);
}
```

On an i7-4790:
```
Benchmark 1: ./sapi/cli/php pack1.php
  Time (mean ± σ):     408.8 ms ±   3.4 ms    [User: 406.1 ms, System: 1.6 ms]
  Range (min … max):   403.6 ms … 413.6 ms    10 runs

Benchmark 2: ./sapi/cli/php_old pack1.php
  Time (mean ± σ):     451.7 ms ±   7.7 ms    [User: 448.5 ms, System: 2.0 ms]
  Range (min … max):   442.8 ms … 461.2 ms    10 runs

Summary
  ./sapi/cli/php pack1.php ran
    1.11 ± 0.02 times faster than ./sapi/cli/php_old pack1.php

Benchmark 1: ./sapi/cli/php pack2.php
  Time (mean ± σ):     239.3 ms ±   6.0 ms    [User: 236.2 ms, System: 2.3 ms]
  Range (min … max):   233.2 ms … 256.8 ms    12 runs

Benchmark 2: ./sapi/cli/php_old pack2.php
  Time (mean ± σ):     271.9 ms ±   3.3 ms    [User: 269.7 ms, System: 1.3 ms]
  Range (min … max):   267.4 ms … 279.0 ms    11 runs

Summary
  ./sapi/cli/php pack2.php ran
    1.14 ± 0.03 times faster than ./sapi/cli/php_old pack2.php
```

On an i7-1185G7:
```
Benchmark 1: ./sapi/cli/php pack1.php
  Time (mean ± σ):     263.7 ms ±   1.8 ms    [User: 262.6 ms, System: 0.9 ms]
  Range (min … max):   261.5 ms … 268.2 ms    11 runs

Benchmark 2: ./sapi/cli/php_old pack1.php
  Time (mean ± σ):     303.3 ms ±   6.5 ms    [User: 300.7 ms, System: 2.3 ms]
  Range (min … max):   297.4 ms … 318.1 ms    10 runs

Summary
  ./sapi/cli/php pack1.php ran
    1.15 ± 0.03 times faster than ./sapi/cli/php_old pack1.php

Benchmark 1: ./sapi/cli/php pack2.php
  Time (mean ± σ):     156.7 ms ±   2.9 ms    [User: 154.7 ms, System: 1.7 ms]
  Range (min … max):   151.6 ms … 164.7 ms    19 runs

Benchmark 2: ./sapi/cli/php_old pack2.php
  Time (mean ± σ):     174.6 ms ±   3.3 ms    [User: 171.9 ms, System: 2.3 ms]
  Range (min … max):   170.7 ms … 180.4 ms    17 runs

Summary
  ./sapi/cli/php pack2.php ran
    1.11 ± 0.03 times faster than ./sapi/cli/php_old pack2.php
```

Co-authored-by: [email protected]
@nielsdos nielsdos changed the title [WIP] Optimize pack() Optimize pack() Jun 11, 2025
@nielsdos
Copy link
Member Author

It's consistent on my laptop. Squashed the commits and updated the descriptions.

@nielsdos nielsdos marked this pull request as ready for review June 11, 2025 19:29
@nielsdos nielsdos requested a review from bukka as a code owner June 11, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants