Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support commit retrie #964

Open
3 tasks
ZENOTME opened this issue Feb 12, 2025 · 5 comments
Open
3 tasks

Support commit retrie #964

ZENOTME opened this issue Feb 12, 2025 · 5 comments

Comments

@ZENOTME
Copy link
Contributor

ZENOTME commented Feb 12, 2025

I would like to separate this task into multiple steps:

  1. Identify the RetryableCommitError type.
    We can introduce a new ErrorKind::RetryableCommitError to abstract kinds of catalog errors.
  2. Support to store the update actions and reapply them to the table when the commit fails.
  3. Add retry commit, this requires a retry library.
    About the retry library, personally, I think https://github.com/Xuanwo/backon can be a good candidate. Its maintainer is @Xuanwo. (Thanks for this great job!)

Welcome more suggestions and elaborations. cc @Fokko @Xuanwo @liurenjie1024 @sdd

@liurenjie1024
Copy link
Contributor

Thanks @ZENOTME for raising this. The core part of commit of conflict detection for different isolation levels, which is quite hard to implement correctly. Retry itself is not a big problem, it's just a reload of table metadata and do conflict detection. I'm not familiar with backon, so not sure if it's a fit here.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 12, 2025

Thanks @ZENOTME for raising this. The core part of commit of conflict detection for different isolation levels, which is quite hard to implement correctly. Retry itself is not a big problem, it's just a reload of table metadata and do conflict detection. I'm not familiar with backon, so not sure if it's a fit here.

Thanks @liurenjie1024. I will take some effort to investigate more about conflict detection.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 12, 2025

Hi, I take some time to figure out the whole commit process. The whole commit phase can be described as follows:

  1. load current metadata from the catalog
  2. create UpdateAction and apply them to the metadata
    • When applying, there is the conflict detection process based on the current local metadata load at step 1
    • The conflict detection process is specific for the Update Action type. e.g. FastAppend just appends data files so it doesn't have conflict detection.
    • If conflict detection in the apply function fails, it means that the table has some conflict and we can't commit. This process abort
  3. If conflict detection passes, we can send the commit message to the catalog.
    • If the commit fails and the catalog returns CommitFailedException, which means that we commit with stale metadata we can jump back to step 1 and try to commit again.
  4. If commit success, then done.

For now, we only support FastAppend. So we can complete the whole process based on the FastAppend first and complete conflict detection when we introduce other update actions. How do you think? @liurenjie1024

@liurenjie1024
Copy link
Contributor

For now, we only support FastAppend. So we can complete the whole process based on the FastAppend first and complete conflict detection when we introduce other update actions.

This sounds reasonable to me.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 13, 2025

For now, we only support FastAppend. So we can complete the whole process based on the FastAppend first and complete conflict detection when we introduce other update actions.

This sounds reasonable to me.

Let's move. I will work on this later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants