perf(prof): speedup hot path in allocator #3505

realFlowControl · 2025-11-28T09:18:03Z

Description

We got some crash reports with stacks like this:

#14  0x00006101e274a1e3 execute_ex 
#15  0x00006101e26ca4f6 zend_call_function 
#16  0x00006101e26ca8ad zend_call_known_function 
#17  0x00006101e26c9647 zend_lookup_class_ex 
#18  0x00006101e26cac0e zend_fetch_class

None of these are actually helpful, because this would just look like a crash in the engine's execute_ex function somewhere. But now we got a new crash trace adding some stack frames that look like stack corruption, but reveal something interesting nonetheless:

#0   0x00006101e26a8712 zend_mm_gc 
#1   0x00007b434a0a82e1 get_or_init<core::cell::RefCell<datadog_php_profiling::RequestLocals>, (), fn() -> core::cell::RefCell<datadog_php_profiling::RequestLocals>> (native/lazy.rs:53:15)
#2   0x00007b434a0a82e1 {closure#0} (native/mod.rs:94:25)
#3   0x00007b434a0a82e1 call_once<datadog_php_profiling::REQUEST_LOCALS::{constant#0}::{closure_env#0}, (core::option::Option<&mut core::option::Option<core::cell::RefCell<datadog_php_profiling::RequestLocals>>>)> (src/ops/function.rs:250:5)
#4   0x00007b434a0a82e1 try_with<core::cell::RefCell<datadog_php_profiling::RequestLocals>, datadog_php_profiling::{impl#2}::try_with_borrow::{closure_env#0}<datadog_php_profiling::RequestLocals, datadog_php_profiling::allocation::alloc_prof_rshutdown::{closure_env#0}, bool>, core::result::Result<bool, core::cell::BorrowError>> (src/thread/local.rs:282:37)
#5   0x00007b434a0a82e1 try_with_borrow<datadog_php_profiling::RequestLocals, datadog_php_profiling::allocation::alloc_prof_rshutdown::{closure_env#0}, bool> (profiling/src/lib.rs:498:12)
#6   0x00007b434a0a82e1 alloc_prof_rshutdown (allocation/mod.rs:152:30)
#7   0x00007b434a0a81be get_or_init<core::cell::RefCell<datadog_php_profiling::allocation::AllocationProfilingStats>, (), fn() -> core::cell::RefCell<datadog_php_profiling::allocation::AllocationProfilingStats>> (native/lazy.rs:56:40)
#8   0x00007b434a0a81be {closure#0} (native/mod.rs:94:25)
#9   0x00007b434a0a81be call_once<datadog_php_profiling::allocation::ALLOCATION_PROFILING_STATS::{constant#0}::{closure_env#0}, (core::option::Option<&mut core::option::Option<core::cell::RefCell<datadog_php_profiling::allocation::AllocationProfilingStats>>>)> (src/ops/function.rs:250:5)
#10  0x00007b434a0a81be try_with<core::cell::RefCell<datadog_php_profiling::allocation::AllocationProfilingStats>, datadog_php_profiling::{impl#2}::try_with_borrow_mut::{closure_env#0}<datadog_php_profiling::allocation::AllocationProfilingStats, datadog_php_profiling::allocation::allocation_le83::alloc_prof_realloc::{closure_env#0}, bool>, core::result::Result<bool, core::cell::BorrowMutError>> (src/thread/local.rs:282:37)
#11  0x00007b434a0a81be try_with_borrow_mut<datadog_php_profiling::allocation::AllocationProfilingStats, datadog_php_profiling::allocation::allocation_le83::alloc_prof_realloc::{closure_env#0}, bool> (profiling/src/lib.rs:507:12)
#12  0x00007b434a0a81be borrow_mut_or_false<std::thread::local::LocalKey<core::cell::RefCell<datadog_php_profiling::allocation::AllocationProfilingStats>>, datadog_php_profiling::allocation::AllocationProfilingStats, datadog_php_profiling::allocation::allocation_le83::alloc_prof_realloc::{closure_env#0}> (profiling/src/lib.rs:489:9)
#13  0x00007b434a0a81be alloc_prof_realloc (allocation/allocation_le83.rs:418:10)
#14  0x00006101e274a1e3 execute_ex 
#15  0x00006101e26ca4f6 zend_call_function 
#16  0x00006101e26ca8ad zend_call_known_function 
#17  0x00006101e26c9647 zend_lookup_class_ex 
#18  0x00006101e26cac0e zend_fetch_class

~~There is just no way that alloc_prof_realloc would call into alloc_prof_rshutdown and from there, there is no way to call into zend_mm_gc...~~
The stack trace is still weird, but alloc_prof_realloc in case ZendMM can't allocate, will bailout (longjmp() to EG(bailout) which usually leads to RSHUTDOWN and as such to alloc_prof_rshutdown. I can't see a path though from alloc_prof_rshutdown to zend_mm_gc...

But there are a few ideas and things to this stack anyway: according to the runtime stack trace, it is in Composer\Autoload\ClassLoader::findFileWithExtension() line 505 which could trigger a reallocation, which on the way down, could call into zend_mm_gc. So at least this makes sense, not 100% with the stack trace we see, but ...

So anyway, I was thinking:

we might not need to do all of this get_or_init() dance with the RefCell in ALLOCATION_PROFILING_STATS at runtime, when we know there is no reentrancy
we also do not need to do the runtime lazy init for the thread local ALLOCATION_PROFILING_STATS itself when we make it a const
SeqCst memory ordering for stats and intervals is not needed, we only need atomicity, there is no happen-before relation to other memory, so we can use Relaxed and also move all the stats behind a feature flag because we are the only ones using it

Reviewer checklist

Test coverage seems ok.
Appropriate labels assigned.

codecov-commenter · 2025-12-02T10:27:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.60%. Comparing base (867b36c) to head (df68409).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3505      +/-   ##
==========================================
- Coverage   61.74%   61.60%   -0.14%     
==========================================
  Files         142      142              
  Lines       12975    12975              
  Branches     1700     1700              
==========================================
- Hits         8011     7993      -18     
- Misses       4204     4222      +18     
  Partials      760      760

see 4 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 867b36c...df68409. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pr-commenter · 2025-12-02T10:42:20Z

Benchmarks [ profiler ]

Benchmark execution time: 2025-12-05 09:52:55

Comparing candidate commit df68409 in PR branch florian/stabilize-and-speedup-allocations with baseline commit 867b36c in branch master.

Found 2 performance improvements and 6 performance regressions! Performance is the same for 22 metrics, 6 unstable metrics.

scenario:php-profiler-exceptions-with-profiler

🟩 execution_time [-8.305ms; -3.247ms] or [-8.175%; -3.196%]

scenario:php-profiler-exceptions-with-profiler-and-timeline

🟩 execution_time [-7.393ms; -4.521ms] or [-7.062%; -4.318%]

scenario:php-profiler-timeline-memory-control

🟥 cpu_user_time [+34.651ms; +40.145ms] or [+5.756%; +6.669%]
🟥 execution_time [+34.303ms; +38.972ms] or [+5.441%; +6.181%]

scenario:php-profiler-timeline-memory-with-profiler

🟥 cpu_user_time [+24.147ms; +64.942ms] or [+2.059%; +5.537%]
🟥 execution_time [+31.811ms; +46.897ms] or [+3.375%; +4.976%]

scenario:php-profiler-timeline-memory-with-profiler-and-timeline

🟥 cpu_user_time [+38.069ms; +73.850ms] or [+2.415%; +4.685%]
🟥 execution_time [+36.537ms; +53.509ms] or [+2.930%; +4.291%]

pr-commenter · 2025-12-02T11:38:10Z

Benchmarks [ tracer ]

Benchmark execution time: 2025-12-02 11:36:13

Comparing candidate commit 1f5dd79 in PR branch florian/stabilize-and-speedup-allocations with baseline commit 17189c6 in branch master.

Found 3 performance improvements and 0 performance regressions! Performance is the same for 190 metrics, 1 unstable metrics.

scenario:BM_TeaSapiSpindown

🟩 execution_time [-33.401µs; -12.352µs] or [-6.168%; -2.281%]

scenario:ComposerTelemetryBench/benchTelemetryParsing

🟩 execution_time [-1.427µs; -0.773µs] or [-12.738%; -6.905%]

scenario:TraceSerializationBench/benchSerializeTrace

🟩 execution_time [-54.266µs; -42.834µs] or [-12.026%; -9.492%]

morrisonlevi

There is just no way that alloc_prof_realloc would call into alloc_prof_rshutdown and from there, there is no way to call into zend_mm_gc...

I don't think this quite right. The function zend_mm_safe_error is called by various allocation functions such as zend_mm_realloc_huge, and zend_mm_safe_error has zend_bailout which jumps to EG(bailout). It could jump to shutdown, yeah?

I was thinking: we might not need to do all of this get_or_init() dance and use a RefCell at runtime, when we can instead initialize the ALLOCATION_PROFILING_STATS in GINIT and then just "know" that this is initialized and use it.

RefCell is not about being initialized or not, that is the thread-local lazy initialization. The RefCell is about enforcing Rust's borrowing rules at runtime.

I think there's possibly some optimization though. We could use MaybeUninit<RefCell<AllocationProfilingStats>> and make it const { MaybeUninit::uninit() } so it doesn't need lazy-initialization checks at runtime. We initialize it in ginit. The RefCell still is there to enforce borrow rules at runtime (cover our butts, yes, it causes crashes, but that's better than undefined behavior).

`UnsafeCell` and document why it is safe anyway.

realFlowControl · 2025-12-04T14:33:37Z

With b0ce8e9 I've guarded the allocation (and exception) stats we are sending with the metadata JSON behind a feature flag because it looks like we are spending nearly ⅓ of our time just in incrementing these counters:

They are also only needed by us as a measurement to validate the upscaling algorithm. I am gonna measure again once the binary is ready to benchmark.

realFlowControl · 2025-12-04T14:44:49Z

There is just no way that alloc_prof_realloc would call into alloc_prof_rshutdown and from there, there is no way to call into zend_mm_gc...

I don't think this quite right. The function zend_mm_safe_error is called by various allocation functions such as zend_mm_realloc_huge, and zend_mm_safe_error has zend_bailout which jumps to EG(bailout). It could jump to shutdown, yeah?

You are right, in case ZendMM can't allocate it bails out. Okay, that that part of the stack trace is fine. Just don't know how zend_mm_gc ends up there 😞

I was thinking: we might not need to do all of this get_or_init() dance and use a RefCell at runtime, when we can instead initialize the ALLOCATION_PROFILING_STATS in GINIT and then just "know" that this is initialized and use it.

RefCell is not about being initialized or not, that is the thread-local lazy initialization. The RefCell is about enforcing Rust's borrowing rules at runtime.

I think there's possibly some optimization though. We could use MaybeUninit<RefCell<AllocationProfilingStats>> and make it const { MaybeUninit::uninit() } so it doesn't need lazy-initialization checks at runtime. We initialize it in ginit. The RefCell still is there to enforce borrow rules at runtime (cover our butts, yes, it causes crashes, but that's better than undefined behavior).

We could benchmark the different options against each other and see. There are probably easier and more safe ways to bring down overhead without doing too much unsafe stuff 😉

profiling/Cargo.toml

profiling/src/allocation/allocation_ge84.rs

profiling/src/allocation/mod.rs

morrisonlevi · 2025-12-04T17:06:02Z

The RefCell still is there to enforce borrow rules at runtime (cover our butts, yes, it causes crashes, but that's better than undefined behavior).

We are okay to use UnsafeCell instead of RefCell. We don't hold a reference while doing anything which could cause reentrancy. We need to better document this though.

morrisonlevi · 2025-12-05T22:36:43Z

I had a thought: when an Apache graceful restart is done, the process goes through mshutdown and back into minit. What is the story there with ginit and gshutdown?

morrisonlevi

I think we should try to encapsulate the access to ALLOCATION_PROFILING_STATS so that other modules are not repeating the unsafe code (and comments that ought to accompany it). I'll work on it.

github-actions bot added profiling Relates to the Continuous Profiler tracing labels Nov 28, 2025

realFlowControl marked this pull request as ready for review November 28, 2025 10:54

realFlowControl requested a review from a team as a code owner November 28, 2025 10:54

realFlowControl force-pushed the florian/stabilize-and-speedup-allocations branch from 5049f86 to be579db Compare December 2, 2025 10:16

realFlowControl force-pushed the florian/stabilize-and-speedup-allocations branch from be579db to 1f5dd79 Compare December 2, 2025 10:29

morrisonlevi reviewed Dec 4, 2025

View reviewed changes

realFlowControl added 2 commits December 4, 2025 07:40

Speedup hotpath in allocator by makeing ALLOCATION_PROFILING_STATS an

d8b01a6

`UnsafeCell` and document why it is safe anyway.

make stats const so that we avoid runtime lazy init checks

d84a013

realFlowControl force-pushed the florian/stabilize-and-speedup-allocations branch from c65a5b8 to d84a013 Compare December 4, 2025 06:40

remove stats from hot path

b0ce8e9

realFlowControl marked this pull request as draft December 4, 2025 14:33

morrisonlevi reviewed Dec 4, 2025

View reviewed changes

profiling/Cargo.toml Show resolved Hide resolved

morrisonlevi reviewed Dec 4, 2025

View reviewed changes

profiling/src/allocation/allocation_ge84.rs Outdated Show resolved Hide resolved

morrisonlevi reviewed Dec 4, 2025

View reviewed changes

profiling/src/allocation/mod.rs Outdated Show resolved Hide resolved

profiling/src/allocation/mod.rs Outdated Show resolved Hide resolved

morrisonlevi changed the title ~~fix(prof): Speedup hot path in allocator and fix possible crash~~ perf(prof): speedup hot path in allocator Dec 4, 2025

realFlowControl added 4 commits December 4, 2025 19:57

fixed UnsafeCell vs MaybeUninit

002a3ca

more unsafe, less checks 🎉

8b6efef

relaxed is fine for stats

7038635

Atomicity is enough for intervals, no happens-before relation

df68409

realFlowControl marked this pull request as ready for review December 5, 2025 15:40

morrisonlevi reviewed Dec 5, 2025

View reviewed changes

morrisonlevi mentioned this pull request Dec 6, 2025

refactor: encapsulate ALLOCATION_PROFILING_STATS for safety #3519

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(prof): speedup hot path in allocator #3505

perf(prof): speedup hot path in allocator #3505

Uh oh!

realFlowControl commented Nov 28, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Dec 2, 2025 •

edited

Loading

Uh oh!

pr-commenter bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

pr-commenter bot commented Dec 2, 2025

Uh oh!

morrisonlevi left a comment

Uh oh!

realFlowControl commented Dec 4, 2025

Uh oh!

realFlowControl commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

morrisonlevi commented Dec 4, 2025

Uh oh!

morrisonlevi commented Dec 5, 2025

Uh oh!

morrisonlevi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf(prof): speedup hot path in allocator #3505

Are you sure you want to change the base?

perf(prof): speedup hot path in allocator #3505

Uh oh!

Conversation

realFlowControl commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reviewer checklist

Uh oh!

codecov-commenter commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pr-commenter bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [ profiler ]

scenario:php-profiler-exceptions-with-profiler

scenario:php-profiler-exceptions-with-profiler-and-timeline

scenario:php-profiler-timeline-memory-control

scenario:php-profiler-timeline-memory-with-profiler

scenario:php-profiler-timeline-memory-with-profiler-and-timeline

Uh oh!

pr-commenter bot commented Dec 2, 2025

Benchmarks [ tracer ]

scenario:BM_TeaSapiSpindown

scenario:ComposerTelemetryBench/benchTelemetryParsing

scenario:TraceSerializationBench/benchSerializeTrace

Uh oh!

morrisonlevi left a comment

Choose a reason for hiding this comment

Uh oh!

realFlowControl commented Dec 4, 2025

Uh oh!

realFlowControl commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

morrisonlevi commented Dec 4, 2025

Uh oh!

morrisonlevi commented Dec 5, 2025

Uh oh!

morrisonlevi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

realFlowControl commented Nov 28, 2025 •

edited

Loading

codecov-commenter commented Dec 2, 2025 •

edited

Loading

pr-commenter bot commented Dec 2, 2025 •

edited

Loading