-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression during "Analyze" of NPM projects after upgrading from v34.0.0 to v37.0.0 #9950
Comments
This guess can easily be checked. Can you please take a look at the |
Hey @fviernau, thanks for taking a look at this. I did a new run with Here is the log from Logs
Is there anything else I should/could enable? |
Thanks. Yeah, my wording was a bit inaccurate. But these |
I've just made some experiment, see [1], trying to request all package details upfront in parallel. What do you think about filing an issue with [1] https://github.com/oss-review-toolkit/ort/tree/npm-view-concurrency |
Hey @fviernau, Because we upgraded from v34 to v37 directly, I re-did some testing with a public repository (for easier reproduction) using the versions in-between to pinpoint to moment it got slower. Setup / How to reproduce
Results
* These hosts had some other small tasks running at the time, more runs might result in a bit faster times. Based on the results it seems that the regression occurred in v35.0.0 - and definitely had greater impact on hosts / runs with more CPUs available (that could perform concurrent tasks). The release notes for that version mention a few refactorings for NPM, but nothing I would suspect of having such an impact. Regarding your question / idea about an upstream issue at npm: |
If you want to help, you could try / verify yourself, that my results were right and that it is really not possible to run multiple |
Hey @fviernau,
I did some testing with this and as far as I can tell, there can be multiple
Results for 5x runs each:
Based on this I would say that running multiple |
Hm,
Very helpful. This means it can be run in parallel, so I'd only need to figure out why it does not work in my experiment I shared above. |
I re-did my performance tests from earlier, this time looking for the exact commit when the analyze started to get slower for NPM packages. Setup:
Results (running on 6 vCPU host):
0092efa seems to have decreased the analyzer performance. Looking at the
Yeah, the npm info calls seem to be the only thing happening for most of that time, running through all the hundreds/thousands packages 👍 |
To me it's not obvious, without investigating further, why 0092efa could have caused this. Because, it seems to only reduce the number of packages getting parsed. It would still probably be interesting to understand that, but we could also go the other route: Would you be willing @malmor to try to patch up https://github.com/oss-review-toolkit/ort/tree/npm-view-concurrency, so that it runs in parallel? (This would also fix your performance issue I believe, and would be more straight forward todo imho compared to figuring out the root cause of 0092efa degradation) |
Hey @fviernau, |
Hey @fviernau, I had a look at https://github.com/oss-review-toolkit/ort/tree/npm-view-concurrency and think that it already (kinda) works. I did some performance tests using https://github.com/renovatebot/renovate as the repository:
Results:
I added some logging (trying to understand what is actually happening there), which shows that there are multiple threads running:
But this only seems to run for all direct (dev) dependencies from the I have no idea how to include transitive dependencies in this. |
Hey, do you have any insights in how to process the transitive dependencies upfront? Or should we discuss this on Slack? 🙂 |
This is what https://github.com/oss-review-toolkit/ort/tree/npm-view-concurrency I've shared before actually was about. edit: I just noticed you already found the issue, here:
Let me quickly fix that. |
Obtaining the package details viva `npm info` is a performance bottleneck of ORT's NPM package manager. Request the package details for all packages upfront, in parellel to reduce execution time. Experiments on a development machine show that execution of `NpmFunTest` now takes `1 min 13 sec` instead of `3 min 47 sec`. Fixes: #9950. Signed-off-by: Frank Viernau <[email protected]>
Done and PR opened. @malmor mind trying it out? |
Obtaining the package details via `npm info` is a performance bottleneck of ORT's NPM package manager. Request the package details for all packages upfront, in parellel to reduce execution time. Experiments on a development machine show that execution of `NpmFunTest` now takes `1 min 13 sec` instead of `3 min 47 sec`. Fixes: #9950. Signed-off-by: Frank Viernau <[email protected]>
Describe the bug
Since upgrading to ort v37.0.0 analyzing NPM projects takes much longer than before.
Our project contains 3x
package.json
files, with a total of about 1.400 dependencies (including dev and indirect dependencies). We are running analyze, evaluate and report phases in a GitLab pipeline.Durations in v34.0.0:
The analysis took 1m 10.252733108s.
The evaluation of 1 script(s) took 8.948435285s.
Created 3 of 3 report(s) in 3.345737593s.
Durations in v37.0.0:
The analysis took 9m 9.239482777s.
<= ⚠The evaluation of 1 script(s) took 12.074116529s.
Created 3 of 3 report(s) in 3.744549232s.
Durations in v52.0.0:
The analysis took 6m 55.184194790s.
The evaluation of 1 script(s) took 7.161464191s.
Created 3 of 3 report(s) in 1m 5.798031179s.
Tests with newer versions (including v52.0.0 which was released yesterday) do not really change the outcome.
After looking at the changes introduced in the relevant versions I assume the following refactoring/change might have caused the performance regression: #9316
Before, multiple concurrent workers have been used to process NPM dependencies (
DefaultDispatcher-worker-xxx
). But this no longer seems to be the case - which would explain the increased analyze duration. CPU usage was also way higher before, which seems logical when running multiple threads.To Reproduce
Running ort on (large) NPM-based projects should show the decreased performance.
Expected behavior
ort should be fast when analyzing NPM projects, as it was before on v34.0.0.
Console / log output
Part of v34.0.0 logs
Part of v37.0.0 logs
Environment
We are running ort inside GitLab pipelines using the official docker image.
Tested versions:
ghcr.io/oss-review-toolkit/ort:34.0.0
(fast)ghcr.io/oss-review-toolkit/ort:37.0.0
(slow)ghcr.io/oss-review-toolkit/ort:52.0.0
(slow)The text was updated successfully, but these errors were encountered: