feat: add rest api server proposal #2517

aagumin · 2025-04-27T18:59:26Z

Purpose of this PR

Proposed changes:

If this looks good, I can start working on the implementation.

Change Category

Bugfix (non-breaking change which fixes an issue)
Feature (non-breaking change which adds functionality)
Breaking change (fix or feature that could affect existing functionality)
Documentation update

Rationale

Checklist

I have conducted a self-review of my own code.
I have updated documentation accordingly.
I have added tests that prove my changes are effective or that my feature works.
Existing unit tests pass locally with my changes.

Additional Notes

google-oss-prow · 2025-04-27T18:59:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vara-bonthu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Arsen Gumin <[email protected]> Signed-off-by: Arsen Gumin <[email protected]>

Signed-off-by: Arsen Gumin <[email protected]>

andreyvelich · 2025-04-29T19:20:05Z

proposals/000-rest-submit-server/proposal.md

+#### Story 1
+As a data engineer, I want to submit Spark jobs by sending a single HTTP request from my CI pipeline, so I don’t need to install or configure `kubectl` on my build agents.
+
+#### Story 2
+As a platform operator, I want to integrate Spark job submission into our internal web portal using REST calls, so that users can launch jobs without learning Kubernetes details.
+
+#### Story 3
+As a user without Kubernetes expertise, I want to use a familiar HTTP API to submit Spark jobs, so I don’t need direct cluster access or knowledge of `kubectl` commands.


Thank you for preparing this proposal @aagumin!

For the Data Engineers and ML Engineers who would like to work with PySpark and interact with Spark Operator, but doesn't want to learn Kubernetes or CRDs, can't we integrate with Kubeflow SDK ?

Kubeflow SDK KEP: https://docs.google.com/document/d/1rX7ELAHRb_lvh0Y7BK1HBYAbA0zi9enB0F_358ZC58w/edit?tab=t.0
Repository: https://github.com/kubeflow/sdk

As we discussed in the proposal we can create dedicated SparkClient() for CRUD operations, so users can quickly create their Spark Application and orchestrate them with Spark Operator without learning Kubernetes.

For example, folks are already working on it as part of this work: #2422

It is a great topic to discuss at the next Spark Operator call: https://bit.ly/3VGzP4n

Would love to hear your feedback
@aagumin @Shekharrajak @lresende @shravan-achar @akshaychitneni @vara-bonthu @yuchaoran2011 @bigsur0 @jacobsalway @ChenYi015 @franciscojavierarceo @astefanutti @rimolive !

Thank you! This is pretty similar to what we thought and trying to use existing solutions. -

some work around & discussions :
#2422
https://www.kubeflow.org/docs/components/spark-operator/user-guide/notebooks-spark-operator/

We need to think in terms of spark app life cycle management, distributing workloads across clusters and debugging & maintenance of the long running jobs.

@andreyvelich Thank you for your feedback!
Developing an SDK is the right solution, but not a very fast one. For example, the Spark client has already been mostly implemented via the Airflow operator. It is capable of retrieving logs, handling errors and statuses. Also, management via Jupyter cannot be considered the only right way to interact, since there are DE workloads (long batch jobs and Spark streaming) that are managed via pipeline orchestrators (Prefect, Airflow, Flyte, etc). REST is a universal access interface and is understood by the majority of the IT community.

We already have Kubeflow SDK, the goal of which to cover all user-facing Kubeflow APIs: https://github.com/kubeflow/sdk.
I know that @Shekharrajak has some thoughts on how we can handle CRUD of SparkApplication CRD as part of SparkClient() in that SDK: https://docs.google.com/document/d/1l57bBlpxrW4gLgAGnoq9Bg7Shre7Cglv4OLCox7ER_s/edit?tab=t.0

Also, management via Jupyter cannot be considered the only right way to interact, since there are DE workloads (long batch jobs and Spark streaming)

We can see two ways to integrate Spark with Jupyter:

with long-running Spark Kernel that can be done with Jupyter Enterprise Gateway as described here (cc @lresende @fresende): https://www.kubeflow.org/docs/components/spark-operator/user-guide/notebooks-spark-operator/

with async batch Spark jobs that can be submitted with Kubeflow SDK. Similar to PyTorch training jobs.

Once we add support for 2nd, I believe that would be very simple to integrate with orchestrators like Airflow, KFP, or any other. Since Kubeflow SDK is not limited to only Jupyter, it is just an abstraction layer on top of SparkApplication CRD.

That might be a good topic to discuss in one of our upcoming ML Experience WG calls.

REST is a universal access interface and is understood by the majority of the IT community.

I think, we should discuss what are the benefits for the additional REST API server if we already have kube API server. Platform admins should be able to simple understand Spark Application APIs to build their internal services on top of Spark Application CRD.

Okay, I understand the community’s position now. Thank you for your feedback! I think we can close this PR.

@nabuskey @vara-bonthu @jacobsalway @yuchaoran2011 @ImpSy Any thoughts on the above points?

@andreyvelich sorry)
I have another question. Are there any plans for clients in other languages? I know many companies that use Java for writing ML models. Or for example, Golang? Many companies build infrastructure management services in Go. With a REST interface, it would be possible to generate clients for different languages.

It is a good question.
We discussed it in the proposal design: https://docs.google.com/document/d/1rX7ELAHRb_lvh0Y7BK1HBYAbA0zi9enB0F_358ZC58w/edit?tab=t.0#heading=h.nu1vasakccpy

If we see strong user interest and active contributors, we can discuss potential support for other languages (e.g. Java).

github-actions · 2025-07-28T20:05:41Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

google-oss-prow bot requested review from ChenYi015, ImpSy and andreyvelich April 27, 2025 18:59

google-oss-prow bot added the size/L label Apr 27, 2025

feat: add rest api server proposal

6a0e9e1

Signed-off-by: Arsen Gumin <[email protected]> Signed-off-by: Arsen Gumin <[email protected]>

aagumin force-pushed the proposals/000-rest-submit branch from e807893 to 6a0e9e1 Compare April 28, 2025 12:47

aagumin added 2 commits April 28, 2025 15:50

chore: minor update

d05a9ba

Signed-off-by: Arsen Gumin <[email protected]>

docs: app uml with example

1c1eb99

Signed-off-by: Arsen Gumin <[email protected]>

andreyvelich reviewed Apr 29, 2025

View reviewed changes

github-actions bot added the lifecycle/stale label Jul 28, 2025

Shekharrajak mentioned this pull request Jul 29, 2025

Enhancing Kubeflow with Batch Processing Gateway for Efficient Multi-Cluster Spark Job Management #2422

Open

github-actions bot removed the lifecycle/stale label Jul 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add rest api server proposal #2517

feat: add rest api server proposal #2517

aagumin commented Apr 27, 2025 •

edited

Loading

Uh oh!

google-oss-prow bot commented Apr 27, 2025

Uh oh!

andreyvelich Apr 29, 2025 •

edited

Loading

Uh oh!

Shekharrajak Jul 29, 2025

Uh oh!

aagumin Sep 18, 2025

Uh oh!

andreyvelich Sep 22, 2025 •

edited

Loading

Uh oh!

aagumin Sep 22, 2025

Uh oh!

andreyvelich Sep 22, 2025

Uh oh!

aagumin Sep 23, 2025

Uh oh!

andreyvelich Sep 23, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add rest api server proposal #2517

Are you sure you want to change the base?

feat: add rest api server proposal #2517

Conversation

aagumin commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this PR

Change Category

Rationale

Checklist

Additional Notes

Uh oh!

google-oss-prow bot commented Apr 27, 2025

Uh oh!

andreyvelich Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shekharrajak Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

aagumin Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

andreyvelich Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aagumin Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

andreyvelich Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

aagumin Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

andreyvelich Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aagumin commented Apr 27, 2025 •

edited

Loading

andreyvelich Apr 29, 2025 •

edited

Loading

andreyvelich Sep 22, 2025 •

edited

Loading