Recommended limits for created/running jobs #559

soxofaan · 2025-02-25T13:55:32Z

This is something that pops up regularly while working on client-side job managers: how many jobs can a user create, how many jobs can run in parallel, ... ?

At the moment, we have in VITO projects some adhoc and per-user configs in the backend and user scripts to steer job managers that create and start tens/hundreds of jobs, but that involves poorly documented and non-standard aligning of various tools.

I think it makes sense to add something to the openEO API that allows backends to expose global or per-user capacity/limits for the number of created jobs, number of concurrently running jobs, etc. That would allow clients to handle this in a cleaner and more transparent way. With the current API, the only official "UI" is basically: just try starting jobs until you get an error, and make sure to backoff/retry properly in some sense.

To give a bit an idea about what I think could be covered here, a non-exhaustive list of things that could be included:

maximum number of concurrently running batch jobs
currently remaining capacity for concurrently running batch jobs
maximum number and currently remaining capacity for number of created (not started) batch jobs
maximum number and currently remaining capacity of concurrent sync requests

These numbers would be just recommendations to follow for clients/tools that support it. Going over limits would just trigger the errors we already have.

I'm not sure yet what would be a good place to expose:

new endpoint
main capabilities doc GET /
part of response on GET /jobs and related?
...?

Note that this would also be interesting in a federation context to steer job distribution.

The text was updated successfully, but these errors were encountered:

m-mohr · 2025-02-26T14:01:22Z

I don't think this is a good idea. (But I'm also not a fan of the client-side approach to create hundres of jobs in the job manager. Shouldn't that be one job? It seems like a back-end limitation that is exposed to the user.)

For example, an implementation in the Web Editor that blocks a submission due to capacity limits would often not be up-to-date due to the request interval. So you could in theory already submit something, the UI just hasn't received up-to-date data. Additionally, if pagination is active, the Editor may not even know how many jobs are active (assuming we just iterate through jobs). Otherwise, you probably need separate statistics of active jobs as part of GET /jobs etc, but then how to expose how many sync jobs are running?

I think the try and error approach here is okay. To ensure up-to-date limits you need to make a request in anyway, we'd just move it to another endpoint. So I'm not sure what we gain.

Users could also just be informed about limits in other ways, e.g. the backend description, and then configure the job manager manually with those limits. Generally, we tried to avoid defining limits too specifically because backends could have limits in so many different ways, that we probably can't think of all of them and in the end it could be an endless list of options. For example, someone may combined limits for sync and batch job, yet another property to add...

soxofaan mentioned this issue Feb 25, 2025

job manager: automatically discover backend limits Open-EO/openeo-python-client#740

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended limits for created/running jobs #559

Recommended limits for created/running jobs #559

soxofaan commented Feb 25, 2025

m-mohr commented Feb 26, 2025 •

edited

Loading

Recommended limits for created/running jobs #559

Recommended limits for created/running jobs #559

Comments

soxofaan commented Feb 25, 2025

m-mohr commented Feb 26, 2025 • edited Loading

m-mohr commented Feb 26, 2025 •

edited

Loading