-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Optimize pack() #18524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Optimize pack() #18524
Conversation
maybe a interessting benchmark/test-case for
vs.
using the |
Unpack will always be slower than just strlen. However, your code revealed that repetitions were handled in a slow way where lots of temporary strings were created and then parsed. I opened a PR to fix that particular issue: #18803 |
could benchmark static inline void php_pack(const zval *val, size_t size,
php_pack_endianness enc, char *out)
{
zend_long z = zval_get_long(val);
if ((enc == PHP_LITTLE_ENDIAN) != MACHINE_LITTLE_ENDIAN) {
z = PHP_LONG_BSWAP(z);
}
memcpy(out, (char*)&z + sizeof(z) - size, size);
} might be faster |
Very strangely, my original code with zend_never_inline is slightly faster than master, but your code without zend_never_inline seems to beat that in my test with the 'J' specifier. Testing some more stuff... |
if the performance difference insignificant/marginal, as in hardly even benchmark-able, i would recommend just ignoring it. I like how this makes pack the code much simpler (assuming it actually works on BE) |
I think I managed to make the compiler happy and let it make good inlining decisions while keeping the code simple. For example for this: for ($i = 0; $i < 10_000_000; ++$i) {
pack("J", 0x7FFFFFFFFFFFFFFF);
} On an i7-4790:
And for this: for ($i=0;$i<4000000;$i++)
pack("nvc*", 0x1234, 0x5678, 65, 66); On the same machine:
Let's hope it's reproducible |
Instead of using lookup tables, we can use a combination of shifts and byte swapping to achieve the same thing in less cycles and with less code. Benchmark files --------------- pack1.php: ```php for ($i = 0; $i < 10_000_000; ++$i) { pack("J", 0x7FFFFFFFFFFFFFFF); } ``` pack2.php: ```php for ($i = 0; $i < 4000000; ++$i) { pack("nvc*", 0x1234, 0x5678, 65, 66); } ``` On an i7-4790: ``` Benchmark 1: ./sapi/cli/php pack1.php Time (mean ± σ): 408.8 ms ± 3.4 ms [User: 406.1 ms, System: 1.6 ms] Range (min … max): 403.6 ms … 413.6 ms 10 runs Benchmark 2: ./sapi/cli/php_old pack1.php Time (mean ± σ): 451.7 ms ± 7.7 ms [User: 448.5 ms, System: 2.0 ms] Range (min … max): 442.8 ms … 461.2 ms 10 runs Summary ./sapi/cli/php pack1.php ran 1.11 ± 0.02 times faster than ./sapi/cli/php_old pack1.php Benchmark 1: ./sapi/cli/php pack2.php Time (mean ± σ): 239.3 ms ± 6.0 ms [User: 236.2 ms, System: 2.3 ms] Range (min … max): 233.2 ms … 256.8 ms 12 runs Benchmark 2: ./sapi/cli/php_old pack2.php Time (mean ± σ): 271.9 ms ± 3.3 ms [User: 269.7 ms, System: 1.3 ms] Range (min … max): 267.4 ms … 279.0 ms 11 runs Summary ./sapi/cli/php pack2.php ran 1.14 ± 0.03 times faster than ./sapi/cli/php_old pack2.php ``` On an i7-1185G7: ``` Benchmark 1: ./sapi/cli/php pack1.php Time (mean ± σ): 263.7 ms ± 1.8 ms [User: 262.6 ms, System: 0.9 ms] Range (min … max): 261.5 ms … 268.2 ms 11 runs Benchmark 2: ./sapi/cli/php_old pack1.php Time (mean ± σ): 303.3 ms ± 6.5 ms [User: 300.7 ms, System: 2.3 ms] Range (min … max): 297.4 ms … 318.1 ms 10 runs Summary ./sapi/cli/php pack1.php ran 1.15 ± 0.03 times faster than ./sapi/cli/php_old pack1.php Benchmark 1: ./sapi/cli/php pack2.php Time (mean ± σ): 156.7 ms ± 2.9 ms [User: 154.7 ms, System: 1.7 ms] Range (min … max): 151.6 ms … 164.7 ms 19 runs Benchmark 2: ./sapi/cli/php_old pack2.php Time (mean ± σ): 174.6 ms ± 3.3 ms [User: 171.9 ms, System: 2.3 ms] Range (min … max): 170.7 ms … 180.4 ms 17 runs Summary ./sapi/cli/php pack2.php ran 1.11 ± 0.03 times faster than ./sapi/cli/php_old pack2.php ``` Co-authored-by: [email protected]
It's consistent on my laptop. Squashed the commits and updated the descriptions. |
Instead of using lookup tables, we can use a combination of shifts and
byte swapping to achieve the same thing in less cycles and with less
code.
Benchmark files
pack1.php:
pack2.php:
Results
On an i7-4790:
On an i7-1185G7: