Skip to content

[HUDI-9594] Allow Hudi to delegate catalog operations to Apache Polaris #13558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 25, 2025

Conversation

rahil-c
Copy link
Contributor

@rahil-c rahil-c commented Jul 15, 2025

Change Logs

JIRA: https://issues.apache.org/jira/browse/HUDI-9594

This PR allows Hudi to integrate with Apache Polaris catalog by delegating createTable to the Polaris spark client, allowing hudi tables to be registered in the Polaris Catalog.
https://polaris.apache.org/in-dev/unreleased/polaris-spark-client/.

The key changes include:

  • Added Polaris detection logic in HoodieSqlCommonUtils.isUsingPolarisCatalog() to identify when Polaris catalog is configured in spark sessison.
  • Enhanced HoodieCatalog.createTable() to delegate table registration to Polaris after creating the Hudi table
  • Modified CreateHoodieTableCommand to skip Hive/SparkCatalog registration when Polaris is enabled
  • Introduced configurable Polaris class name via hoodie.datasource.polaris.catalog.class property
  • Add test with mock Polaris catalog to verify delegation behavior

Impact

Public API Changes:

  • New configuration property: hoodie.datasource.polaris.catalog.class (default: org.apache.polaris.spark.SparkCatalog)
  • Enhanced catalog behavior: When Polaris is detected, table creation delegates to external catalog

User-facing Changes:

  • Backward compatible: Existing deployments unaffected when Polaris not configured
  • polaris catalog experience: Hudi tables can be registered in Polaris catalog

Performance Impact:

  • minimal to no impact for create DDL

Risk level: Low

Verification done:

  • added unit tests covering both delegation and non-delegation scenarios
  • have tested with ITs in the Polaris repo
  • have tested integration end to end manually with local polaris catalog service

The risk is low because:

  • Delegation only occurs when explicitly configured with polaris spark catalog property in spark session

Documentation Update

Required updates:

  • Configuration reference: Document new hoodie.datasource.polaris.catalog.class property
  • Integration guide: Add Apache Polaris setup and configuration instructions

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jul 15, 2025
@rahil-c
Copy link
Contributor Author

rahil-c commented Jul 15, 2025

@yihua can you review this when you get a chance?

@rahil-c rahil-c force-pushed the rahil/hoodie-catalog branch from 3786ba1 to 339e66c Compare July 16, 2025 01:09
@rahil-c
Copy link
Contributor Author

rahil-c commented Jul 16, 2025

@Davis-Zhang-Onehouse if you can also take a look would be appreciated.

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production code changes LGTM.

@rahil-c rahil-c force-pushed the rahil/hoodie-catalog branch from b188880 to 83a8ea2 Compare July 23, 2025 06:37
@github-actions github-actions bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels Jul 23, 2025
@rahil-c rahil-c force-pushed the rahil/hoodie-catalog branch from 83a8ea2 to 462fd6f Compare July 23, 2025 07:44
@rahil-c
Copy link
Contributor Author

rahil-c commented Jul 23, 2025

@yihua I have added a config for this polaris spark catalog class name in case the value changes in future. Please take a look when you get a chance

@rahil-c
Copy link
Contributor Author

rahil-c commented Jul 23, 2025

@yihua the ci is failing for the flaky test testRLIWithMDTCleaning. When i ran it locally it passes
Screenshot 2025-07-23 at 11 17 36 AM

wondering if we can just rerun this one failed job?

@yihua
Copy link
Contributor

yihua commented Jul 23, 2025

@yihua the ci is failing for the flaky test testRLIWithMDTCleaning. When i ran it locally it passes Screenshot 2025-07-23 at 11 17 36 AM

wondering if we can just rerun this one failed job?

Yes. Only committers can retrigger a single failed job.

@rahil-c rahil-c requested a review from yihua July 23, 2025 23:40
Comment on lines 260 to 262
.markAdvanced()
.withDocumentation("Fully qualified class name of the catalog that is used by the Polaris spark client.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.markAdvanced()
.withDocumentation("Fully qualified class name of the catalog that is used by the Polaris spark client.")
.markAdvanced()
.sinceVersion("1.1.0")
.withDocumentation("Fully qualified class name of the catalog that is used by the Polaris spark client.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better for me to specify this for the next point release, instead of major release. So it should be 1.0.3 instead of 1.1.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the new config is only added to major release. So it's preferred to use 1.1.0.

* Mock Polaris Spark Catalog for testing delegation behavior.
* Only implements essential methods: createTable and loadTable.
*/
class MockPolarisSparkCatalog extends TableCatalog {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the PR change to support Hudi in Polaris is merged, it would be good to add an integration tests using docker in GitHub action. Let's add a JIRA ticket to track that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good idea, I am planning on adding integ tests in Polaris repo for hudi, but think we should make a followup to do same in hudi repo.

Filed the JIRA here: https://issues.apache.org/jira/browse/HUDI-9639

Comment on lines 79 to 81
if (enablePolaris) {
hoodieCatalog.setDelegateCatalog(mockPolarisDelegate)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mimic Polaris's Spark catalog behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg, let's enhance the comment to mention this mimic Polaris's Spark catalog behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do so thanks!

@rahil-c rahil-c requested a review from yihua July 24, 2025 18:41
Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rahil-c rahil-c force-pushed the rahil/hoodie-catalog branch from 40c455d to c41c691 Compare July 25, 2025 00:39
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@rahil-c
Copy link
Contributor Author

rahil-c commented Jul 25, 2025

@yihua was wondering if you can merge this when you get a chance?

@yihua yihua merged commit 9f4ae4b into apache:master Jul 25, 2025
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants