KEP-913: Add a KEP for the reusuable KFP components repository #914

mprahl · 2025-10-20T21:17:15Z

This KEP proposes creating a dedicated kubeflow/kfp-components repository to host reusable Kubeflow Pipelines components and pipelines under a clear core vs third_party split, with standardized per-asset metadata and autogenerated READMEs, enforced CI (formatting, docstrings, static import guard, compile checks, dependency probes, optional pytest, example compilation), separate Python packages for core and third-party (kfp-components, kfp-components-third-party) with ergonomic imports and semver aligned to Kubeflow, and governance via OWNERS plus scheduled automation to keep assets verified, dependencies current, and stale items removed; rollout covers bootstrapping the repo, migrating curated assets with a deprecation window, onboarding third parties, and coordinating with ongoing pipelines cleanup to reduce fragmentation and improve discoverability, reliability, and reuse.

Resolves: #913

google-oss-prow · 2025-10-20T21:17:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign terrytangyuan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mprahl · 2025-10-20T21:18:31Z

@thesuperzapper @HumairAK @droctothorpe @zazulam @chensun @andreyvelich @franciscojavierarceo could you please review this proposal if interested? This was brought up on the last community call by @HumairAK and I can bring it up for discussion next week.

Feel free to tag others for review as well.

droctothorpe · 2025-10-21T00:33:20Z

proposals/913-components-repo/README.md

+
+1. Move reusable components and pipelines into a dedicated GitHub repository with clear structure and governance.
+2. Provide standardized metadata, documentation, and testing requirements for every asset.
+3. Ship an installable Python package for core (community-maintained) artifacts that is versioned to match Kubeflow


Do you mean "components" not "artifacts"?

Whoops. You're right.

droctothorpe · 2025-10-21T00:33:58Z

proposals/913-components-repo/README.md

+4. Maintain a parallel, clearly demarcated area for third-party contributions, shipped as its own Python package that
+   tracks the same release cadence as the core catalog.
+5. Automate maintenance (e.g. stale component detection, dependency validation) to keep the catalog healthy.
+6. Provide developer onboarding materials and guidance for agents generating components/pipelines.


This will be fantastic training data for a LLM-driven pipeline generator.

droctothorpe · 2025-10-21T00:35:56Z

proposals/913-components-repo/README.md

+
+## Summary
+
+Establish a dedicated Kubeflow Pipelines (KFP) repository\* that hosts reusable components and full pipelines under a


We have a robust internal product offering that implements much of what this document describes. I will encourage the internal maintainers to explore the possibility of contributing.

For further context, Red Hat aims to have their own repo of Red Hat supported components for OpenShift AI customers, but aims to use the same repo structure, CI, documentation style, and etc. as what gets accepted in upstream Kubeflow (this proposal). So if Capital One can also align, then we'd have more resources for working on follow up things like potentially an API and UI for this catalog to contribute to upstream Kubeflow and use downstream.

droctothorpe · 2025-10-21T00:38:30Z

proposals/913-components-repo/README.md

+│   │       │   └── test_component.py
+│   │       └── <supporting_files>
+│   └── ... (other categories: evaluation/, data_processing/, etc.)
+├── pipelines


Curious to hear more about the rationale behind sharable pipelines (as opposed to components).

@droctothorpe I see a few reasons:

Nested pipelines are supported, so sometimes various components benefit from running in parallel or in a chain and then can be used as if it were a component.

Provides good examples for how some of these components stitch together for common use cases (e.g. converting PDFs to markdown and inserting them in a Vector database)

It provides quick starts for tutorials and documentation.

droctothorpe · 2025-10-21T00:40:10Z

proposals/913-components-repo/README.md

+- Every asset must include `component.py` or `pipeline.py`, `metadata.yaml`, `README.md`, `OWNERS`, and optional
+  supporting files. The `OWNERS` file empowers the owning team to review changes, update metadata, and manage lifecycle
+  tasks without central gatekeeping.
+- Optional internal unit tests must live under a `tests/` subdirectory to avoid clutter.


Should unit tests be mandatory?

And what about integration tests to run the components on actual KFP infrastructure?

I'll add a section about enabling tests on Kind clusters in CI with KFP installed in standalone mode. My concern is that many components may depend on external services or be resource intensive so I didn't want to require it, but I think it's valid to have it as an option. I'll explicitly make it be an opt-out in the metadata file.

I'll add a new file such as test_pipeline.py in the components/pipelines directories that get automatically run in the CI environment if defined. In test_pipeline.py, we have an optional verify_pipeline function that takes the completed pipeline and KFP client as input and is responsible for verifying the result.

droctothorpe · 2025-10-21T00:41:07Z

proposals/913-components-repo/README.md

+  lint/tests pass but does not guarantee functionality.
+- Third-party assets remain the responsibility of their listed owners; Kubeflow maintainers provide validation
+  infrastructure only.
+


We may want to clarify a formal deprecation policy in case incompatibilities or CVEs surface in a contributed component.

NVM, I see you addressed this further down.

droctothorpe · 2025-10-21T00:44:03Z

proposals/913-components-repo/README.md

+
+### Standardized README Templates
+
+Each component/pipeline directory includes a `README.md` generated from a template and auto-populated with docstring


Might be nice to provide a CLI that handles boilerplate generation and validation in a consolidated way that can also be invoked in CI.

Good point, I'll add that!

droctothorpe · 2025-10-21T00:46:01Z

proposals/913-components-repo/README.md

+3. Black formatting (`black --check --line-length 120`).
+4. Docstring lint verifying Google-style docstrings (e.g. `pydocstyle --convention=google`) and enforcing docstrings on
+   every `dsl.component` or `dsl.pipeline`-decorated function.
+5. Static import guard: ensure only stdlib imports appear at module top level; third-party imports must live inside the


I wonder if people will want to author components that leverage the embedded artifact pattern you authored (since it greatly simplifies component logic testing), in which case, the static import guard may need to be refined.

Good point. I'll keep it local for now but we can adjust later.

droctothorpe · 2025-10-21T00:49:25Z

proposals/913-components-repo/README.md

+
+### Open Questions
+
+- Should we expose the catalog via an API/website in addition to GitHub? (Out of scope initially but worth tracking.)


Something like https://operatorhub.io/ would be appreciated by end users I think. If we build a consolidated CLI, it would be cool if it provided some list / describe component capabilities as well.

Just to clarify, I think a static website makes more sense than an API; it could easily be generated in CI and served via GH Pages.

Agreed. I might play around with this and see how much effort it'd be to contribute one.

droctothorpe · 2025-10-21T00:51:43Z

proposals/913-components-repo/README.md

+### Open Questions
+
+- Should we expose the catalog via an API/website in addition to GitHub? (Out of scope initially but worth tracking.)
+- Should the core components Python package be included in the Kubeflow SDK directly?


I lean towards no, but maybe there are some benefits that I'm not taking into consideration.

IMO both are helpful for different reasons, one for SEO and the other because developers are too lazy to google and we could probably compile the examples directly into docs. Probably that would be better in general anyways.

droctothorpe · 2025-10-21T00:56:32Z

Left some minor comments and questions. Overall, looks really good! Appreciate all the thought that went into this. It's a massive step up from the components directory in its current state.

briangallagher · 2025-10-27T09:26:53Z

proposals/913-components-repo/README.md

+- Third-party assets remain the responsibility of their listed owners; Kubeflow maintainers provide validation
+  infrastructure only.
+
+### Artifact Metadata Schema


Hey Matt, as the catalog grows having a structured approach to discoverability might be nice. Some thoughts:

A CI job to build a consolidated catalog (catalog.json) from fields in metadata.yaml from which a UI or SDK could be built. Publish the catalog for easy integration with external tools.

A tags fields might be useful

I would imagine a lot of the components will be related to other kubeflow components, trainer, katib etc. Having more explicit fields for them along with min_ versions might be useful. Treat them as "core dependencies". For example, as a user I want to use kf trainer and I want to know all available components and version compatibility, from the sdk or ui related to kubeflow trainer.

Thanks for the review @briangallagher! I addressed points 2 and 3. I put point 1 in the open questions section so we don't lose track of it.

HumairAK · 2025-10-27T19:03:57Z

LGTM

thesuperzapper · 2025-10-28T16:15:37Z

proposals/913-components-repo/README.md

+project introduces standardized metadata, documentation, testing, and maintenance automation to make components
+discoverable, reliable, and safe to adopt.
+
+\*Working title `kubeflow/kfp-components`; the final repository name will be confirmed during implementation.


I would like to propose we call it kubeflow/pipelines-components for consistency with kubeflow/pipelines.

terrytangyuan · 2025-10-28T16:17:54Z

+1

thesuperzapper · 2025-10-28T16:18:55Z

proposals/913-components-repo/README.md

+│   │   └── <component-name>/
+│   │       ├── __init__.py (exposes the component entrypoint for imports)
+│   │       ├── component.py
+│   │       ├── metadata.yaml


As discussed in the community meeting, we need to consider how we will manage the docker images (which are part of the component).

My preference is that we require components to ONLY use either:

an approved "base" docker image

an extension of one of the "approved base images" using a Dockerfile defined in this repo (possibly under the components folder).

@thesuperzapper the latest push adds content around this. Please let me know if that aligns with your thoughts.

Signed-off-by: mprahl <[email protected]>

johnugeorge · 2025-10-28T19:48:17Z

Lgtm

google-oss-prow bot requested review from andreyvelich and terrytangyuan October 20, 2025 21:17

google-oss-prow bot added the size/L label Oct 20, 2025

droctothorpe reviewed Oct 21, 2025

View reviewed changes

mprahl force-pushed the components-repo branch from f72f2fb to 6955a81 Compare October 23, 2025 18:59

mprahl requested review from droctothorpe and stephenpardy October 23, 2025 18:59

briangallagher reviewed Oct 27, 2025

View reviewed changes

thesuperzapper reviewed Oct 28, 2025

View reviewed changes

mprahl force-pushed the components-repo branch from 6955a81 to acd4aac Compare October 28, 2025 17:31

Add a KEP for the reusuable KFP components repo

63d2659

Signed-off-by: mprahl <[email protected]>

mprahl force-pushed the components-repo branch from acd4aac to 63d2659 Compare October 28, 2025 17:34

mprahl requested review from briangallagher and thesuperzapper October 28, 2025 17:34


		## Summary

		Establish a dedicated Kubeflow Pipelines (KFP) repository\* that hosts reusable components and full pipelines under a


		### Standardized README Templates

		Each component/pipeline directory includes a `README.md` generated from a template and auto-populated with docstring


		### Open Questions

		- Should we expose the catalog via an API/website in addition to GitHub? (Out of scope initially but worth tracking.)

Uh oh!

KEP-913: Add a KEP for the reusuable KFP components repository #914

Are you sure you want to change the base?

KEP-913: Add a KEP for the reusuable KFP components repository #914

Uh oh!

Conversation

mprahl commented Oct 20, 2025

Uh oh!

google-oss-prow bot commented Oct 20, 2025

Uh oh!

mprahl commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

droctothorpe Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

droctothorpe commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HumairAK commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

terrytangyuan commented Oct 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnugeorge commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

droctothorpe Oct 21, 2025 •

edited

Loading