-
Notifications
You must be signed in to change notification settings - Fork 108
Description
I am running the prometheus_client_php code on servers with several million entries in APCu. The use of nested calls to APCUIterator
in collectCounters()
, collectGauges()
, and collectHistograms()
is O(N^2), with N being the number of keys stored in APCu. When N is large -- several millions of keys -- this slows the code to a crawl and makes the APC storage engine unusable.
In my staging environment with approximately 2.5 million APCu keys, I tested a call to getMetricFamilySamples()
taking approximately 9 seconds to return, due to the above issue, for example. During that time, one CPU core is pegged by the code running this nested pair of loops across all the APCu keys!
I have forked the code and re-implemented portions of the APC storage engine, to use a tree-based storage pattern, essentially a directed acyclic graph, where starting at the root of the tree leads to discovery of all APCu keys that the storage engine is using. This presents a massive speed-up in our environment, since performance is now O(N), where N is the number of (metrics x labels x buckets) being tracked, and is unrelated to the number of keys stored in APCu. Usually N in this case is a couple dozen keys to maybe a few hundred max, so relatively small.
Performance scales MUCH better with this change. In my same staging environment with 2.5M APCu keys, a test call to getMetricFamilySamples()
decreased from 9 seconds to 33ms(!). This is fast enough that I can have Prometheus poll metrics every second, if I want.
Before dropping a PR here, I figured I'd open this issue and see if there is any interest in discussing the solution, or if you prefer to see the code right away. The current code passes all phpstan and phpunit tests. Feel free to reply, or if you just want to see the code, let me know and I'll drop a PR.