-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Parallel save result with export_workspace. #995
Comments
TODO: Estimate the amount of work this issue will require |
LCFM also benefits from this issue |
A quick attempt to distribute the
It might be needed to run the |
Would something like collectAsync help in this case:
|
…ith parallel running save_results. Open-EO/openeo-geopyspark-driver#995
No huge differences found with a small job that has multiple save results. Launched with and without
|
This features tries to minimize idle time of executors right? Perhaps you can try to increase the max-executors to see if we can increase the speed without increasing the total cost? |
Yes. On a large graph the max-executors is clearly limiting. I am running a large test with max-executors on 100. I'll report back when that one is finished. With a medium sized job,
|
Example Job: j-25011314204947baa2057fa6f64bae8b
More info: spark is perfectly fine with running multiple jobs and stages concurrently. By doing so, we can keep executor allocation rate high.
So in the presence of multiple 'save_result' nodes, this could really help to improve overall performance.
The text was updated successfully, but these errors were encountered: