Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended limits for created/running jobs #559

Open
soxofaan opened this issue Feb 25, 2025 · 1 comment
Open

Recommended limits for created/running jobs #559

soxofaan opened this issue Feb 25, 2025 · 1 comment

Comments

@soxofaan
Copy link
Member

This is something that pops up regularly while working on client-side job managers: how many jobs can a user create, how many jobs can run in parallel, ... ?

At the moment, we have in VITO projects some adhoc and per-user configs in the backend and user scripts to steer job managers that create and start tens/hundreds of jobs, but that involves poorly documented and non-standard aligning of various tools.

I think it makes sense to add something to the openEO API that allows backends to expose global or per-user capacity/limits for the number of created jobs, number of concurrently running jobs, etc. That would allow clients to handle this in a cleaner and more transparent way. With the current API, the only official "UI" is basically: just try starting jobs until you get an error, and make sure to backoff/retry properly in some sense.

To give a bit an idea about what I think could be covered here, a non-exhaustive list of things that could be included:

  • maximum number of concurrently running batch jobs
  • currently remaining capacity for concurrently running batch jobs
  • maximum number and currently remaining capacity for number of created (not started) batch jobs
  • maximum number and currently remaining capacity of concurrent sync requests

These numbers would be just recommendations to follow for clients/tools that support it. Going over limits would just trigger the errors we already have.

I'm not sure yet what would be a good place to expose:

  • new endpoint
  • main capabilities doc GET /
  • part of response on GET /jobs and related?
  • ...?

Note that this would also be interesting in a federation context to steer job distribution.

@m-mohr
Copy link
Member

m-mohr commented Feb 26, 2025

I don't think this is a good idea. (But I'm also not a fan of the client-side approach to create hundres of jobs in the job manager. Shouldn't that be one job? It seems like a back-end limitation that is exposed to the user.)

For example, an implementation in the Web Editor that blocks a submission due to capacity limits would often not be up-to-date due to the request interval. So you could in theory already submit something, the UI just hasn't received up-to-date data. Additionally, if pagination is active, the Editor may not even know how many jobs are active (assuming we just iterate through jobs). Otherwise, you probably need separate statistics of active jobs as part of GET /jobs etc, but then how to expose how many sync jobs are running?

I think the try and error approach here is okay. To ensure up-to-date limits you need to make a request in anyway, we'd just move it to another endpoint. So I'm not sure what we gain.

Users could also just be informed about limits in other ways, e.g. the backend description, and then configure the job manager manually with those limits. Generally, we tried to avoid defining limits too specifically because backends could have limits in so many different ways, that we probably can't think of all of them and in the end it could be an endless list of options. For example, someone may combined limits for sync and batch job, yet another property to add...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants