Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel]Generalize the actions after commit(like checkpoint) by introducing post commit action to kernel #4115

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

huan233usc
Copy link
Collaborator

@huan233usc huan233usc commented Feb 1, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR doesn't make any functional changes, but abstract checkpoint into post commit action. This is prepared adding more post commit actions such as CRC write (#4116)

How was this patch tested?

Existing unit test, manual test using delta/kernel/examples/run-kernel-examples.py --use-local

Does this PR introduce any user-facing changes?

No

@huan233usc huan233usc marked this pull request as ready for review February 3, 2025 02:33
@huan233usc huan233usc changed the title [Kernel]Introduce post commit action to kernel [Kernel]Generalize the actions after commit(like checkpoint) by introducing post commit action to kernel Feb 3, 2025
@huan233usc huan233usc self-assigned this Feb 3, 2025
Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left some comments!

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

import io.delta.kernel.engine.Engine;
import java.io.IOException;

public interface PostCommitHook {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add more docs here declaring what type of work is considered a PostCommitHook? And how an engine should treat them? i.e. are they required, how do they relate to the commit, etc

Alternatively, maybe more thorough docs for each type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some doc, PTAL. Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add what sort of operations and latency wise. So that the connector can choose to run it async.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the documentation to indicate supported operations and latency indication for checkpoint in below section.

Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments then lgtm

public interface PostCommitHook {

enum PostCommitHookType {
// Write a new checkpoint at the version committed by the txn if required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is maybe a little ambiguous as to when it is present (i.e. is it always present and only sometime checkpoints?)

Maybe something like "Write a new checkpoint at the version committed by the txn. This hook is present when the table is ready for checkpoint according to its configured checkpoint interval"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do /* */ comment for the enum

import io.delta.kernel.internal.fs.Path;
import java.io.IOException;

/** Write a new checkpoint at the version committed by the txn if required. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Write a new checkpoint at the version committed by the txn if required. */
/** Write a new checkpoint at the version committed by the txn. */

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this hook is created, it is required

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.

*/
public boolean isReadyForCheckpoint() {
return isReadyForCheckpoint;
/** @return list of operations to trigger after commit. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more details such as how the connector can choose to run and how it may affect the query performance etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a usage section to cover these. "query performance" means the commit latency in this context?

import io.delta.kernel.engine.Engine;
import java.io.IOException;

public interface PostCommitHook {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add what sort of operations and latency wise. So that the connector can choose to run it async.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants