-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[HUDI-9594] Allow Hudi to delegate catalog operations to Apache Polaris #13558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@yihua can you review this when you get a chance? |
3786ba1
to
339e66c
Compare
@Davis-Zhang-Onehouse if you can also take a look would be appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Production code changes LGTM.
...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
Outdated
Show resolved
Hide resolved
...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
Show resolved
Hide resolved
b188880
to
83a8ea2
Compare
83a8ea2
to
462fd6f
Compare
@yihua I have added a config for this polaris spark catalog class name in case the value changes in future. Please take a look when you get a chance |
@yihua the ci is failing for the flaky test wondering if we can just rerun this one failed job? |
Yes. Only committers can retrigger a single failed job. |
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
Outdated
Show resolved
Hide resolved
.markAdvanced() | ||
.withDocumentation("Fully qualified class name of the catalog that is used by the Polaris spark client.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.markAdvanced() | |
.withDocumentation("Fully qualified class name of the catalog that is used by the Polaris spark client.") | |
.markAdvanced() | |
.sinceVersion("1.1.0") | |
.withDocumentation("Fully qualified class name of the catalog that is used by the Polaris spark client.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better for me to specify this for the next point release, instead of major release. So it should be 1.0.3
instead of 1.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the new config is only added to major release. So it's preferred to use 1.1.0
.
...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
Outdated
Show resolved
Hide resolved
...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
Show resolved
Hide resolved
* Mock Polaris Spark Catalog for testing delegation behavior. | ||
* Only implements essential methods: createTable and loadTable. | ||
*/ | ||
class MockPolarisSparkCatalog extends TableCatalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the PR change to support Hudi in Polaris is merged, it would be good to add an integration tests using docker in GitHub action. Let's add a JIRA ticket to track that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good idea, I am planning on adding integ tests in Polaris repo for hudi, but think we should make a followup to do same in hudi repo.
Filed the JIRA here: https://issues.apache.org/jira/browse/HUDI-9639
if (enablePolaris) { | ||
hoodieCatalog.setDelegateCatalog(mockPolarisDelegate) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mimic Polaris's Spark catalog behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see this line here in the Polaris side changes https://github.com/apache/polaris/pull/1862/files#diff-4737e300174f8a433f92317317dc2cb6387148df6e8aee8e24742285e9feddc0R68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg, let's enhance the comment to mention this mimic Polaris's Spark catalog behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do so thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
40c455d
to
c41c691
Compare
@yihua was wondering if you can merge this when you get a chance? |
Change Logs
JIRA: https://issues.apache.org/jira/browse/HUDI-9594
This PR allows Hudi to integrate with Apache Polaris catalog by delegating createTable to the Polaris spark client, allowing hudi tables to be registered in the Polaris Catalog.
https://polaris.apache.org/in-dev/unreleased/polaris-spark-client/.
The key changes include:
HoodieSqlCommonUtils.isUsingPolarisCatalog()
to identify when Polaris catalog is configured in spark sessison.HoodieCatalog.createTable()
to delegate table registration to Polaris after creating the Hudi tableCreateHoodieTableCommand
to skip Hive/SparkCatalog registration when Polaris is enabledhoodie.datasource.polaris.catalog.class
propertyImpact
Public API Changes:
hoodie.datasource.polaris.catalog.class
(default:org.apache.polaris.spark.SparkCatalog
)User-facing Changes:
Performance Impact:
Risk level: Low
Verification done:
The risk is low because:
Documentation Update
Required updates:
hoodie.datasource.polaris.catalog.class
propertyContributor's checklist