-
Notifications
You must be signed in to change notification settings - Fork 165
perf(prof): speedup hot path in allocator #3505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
5049f86 to
be579db
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3505 +/- ##
==========================================
- Coverage 61.74% 61.60% -0.14%
==========================================
Files 142 142
Lines 12975 12975
Branches 1700 1700
==========================================
- Hits 8011 7993 -18
- Misses 4204 4222 +18
Partials 760 760 see 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
be579db to
1f5dd79
Compare
Benchmarks [ profiler ]Benchmark execution time: 2025-12-05 09:52:55 Comparing candidate commit df68409 in PR branch Found 2 performance improvements and 6 performance regressions! Performance is the same for 22 metrics, 6 unstable metrics. scenario:php-profiler-exceptions-with-profiler
scenario:php-profiler-exceptions-with-profiler-and-timeline
scenario:php-profiler-timeline-memory-control
scenario:php-profiler-timeline-memory-with-profiler
scenario:php-profiler-timeline-memory-with-profiler-and-timeline
|
Benchmarks [ tracer ]Benchmark execution time: 2025-12-02 11:36:13 Comparing candidate commit 1f5dd79 in PR branch Found 3 performance improvements and 0 performance regressions! Performance is the same for 190 metrics, 1 unstable metrics. scenario:BM_TeaSapiSpindown
scenario:ComposerTelemetryBench/benchTelemetryParsing
scenario:TraceSerializationBench/benchSerializeTrace
|
morrisonlevi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is just no way that alloc_prof_realloc would call into alloc_prof_rshutdown and from there, there is no way to call into zend_mm_gc...
I don't think this quite right. The function zend_mm_safe_error is called by various allocation functions such as zend_mm_realloc_huge, and zend_mm_safe_error has zend_bailout which jumps to EG(bailout). It could jump to shutdown, yeah?
I was thinking: we might not need to do all of this get_or_init() dance and use a RefCell at runtime, when we can instead initialize the ALLOCATION_PROFILING_STATS in GINIT and then just "know" that this is initialized and use it.
RefCell is not about being initialized or not, that is the thread-local lazy initialization. The RefCell is about enforcing Rust's borrowing rules at runtime.
I think there's possibly some optimization though. We could use MaybeUninit<RefCell<AllocationProfilingStats>> and make it const { MaybeUninit::uninit() } so it doesn't need lazy-initialization checks at runtime. We initialize it in ginit. The RefCell still is there to enforce borrow rules at runtime (cover our butts, yes, it causes crashes, but that's better than undefined behavior).
`UnsafeCell` and document why it is safe anyway.
c65a5b8 to
d84a013
Compare
|
With b0ce8e9 I've guarded the allocation (and exception) stats we are sending with the metadata JSON behind a feature flag because it looks like we are spending nearly ⅓ of our time just in incrementing these counters:
They are also only needed by us as a measurement to validate the upscaling algorithm. I am gonna measure again once the binary is ready to benchmark. |
You are right, in case ZendMM can't allocate it bails out. Okay, that that part of the stack trace is fine. Just don't know how
We could benchmark the different options against each other and see. There are probably easier and more safe ways to bring down overhead without doing too much unsafe stuff 😉 |
We are okay to use |
|
I had a thought: when an Apache graceful restart is done, the process goes through mshutdown and back into minit. What is the story there with ginit and gshutdown? |
morrisonlevi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to encapsulate the access to ALLOCATION_PROFILING_STATS so that other modules are not repeating the unsafe code (and comments that ought to accompany it). I'll work on it.

Description
We got some crash reports with stacks like this:
None of these are actually helpful, because this would just look like a crash in the engine's
execute_exfunction somewhere. But now we got a new crash trace adding some stack frames that look like stack corruption, but reveal something interesting nonetheless:There is just no way thatalloc_prof_reallocwould call intoalloc_prof_rshutdownand from there, there is no way to call intozend_mm_gc...The stack trace is still weird, but
alloc_prof_reallocin case ZendMM can't allocate, will bailout (longjmp()toEG(bailout)which usually leads toRSHUTDOWNand as such toalloc_prof_rshutdown. I can't see a path though fromalloc_prof_rshutdowntozend_mm_gc...But there are a few ideas and things to this stack anyway: according to the runtime stack trace, it is in
Composer\Autoload\ClassLoader::findFileWithExtension()line 505 which could trigger a reallocation, which on the way down, could call intozend_mm_gc. So at least this makes sense, not 100% with the stack trace we see, but ...So anyway, I was thinking:
get_or_init()dance with theRefCellinALLOCATION_PROFILING_STATSat runtime, when we know there is no reentrancyALLOCATION_PROFILING_STATSitself when we make it aconstSeqCstmemory ordering for stats and intervals is not needed, we only need atomicity, there is no happen-before relation to other memory, so we can useRelaxedand also move all the stats behind a feature flag because we are the only ones using itReviewer checklist