Skip to content

Added code for Pageview Update with Single Transaction #810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

harsh-0409
Copy link

Single Commit:
Remove the batch commit strategy (i.e., committing after every 50,000 rows) and instead commit only once after processing all rows. This guarantees that either all pageview updates are applied or none are, maintaining data integrity.

…unction

- Added explicit transaction handling using wp10db.begin() to ensure all updates are treated as a single unit.
- Implemented error handling with rollback to maintain data integrity in case of errors during the update process.
- Introduced a batch commit strategy to commit changes after processing a specified number of rows (commit_after) to optimize performance and reduce overhead.
- Updated logging to reflect the number of committed updates.
- Atomic Update:
The transaction is started with wp10db.begin() before processing any rows and committed only once after the loop completes. This ensures that the entire update is applied as a single atomic unit.

- Error Handling & Rollback:
If an error occurs during processing, the except block rolls back the entire transaction, maintaining data integrity.

- Removal of Batch Commits:
The previous batch commit strategy (committing after every 50,000 rows) has been removed. As noted by @audiodude, committing in batches would break the requirement to update the entire dataset in one transaction.
@kelson42 kelson42 requested a review from audiodude March 18, 2025 19:02
Copy link
Member

@audiodude audiodude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you restore the 2 spaces indentation so that I can see a proper diff?

@harsh-0409
Copy link
Author

Hey @audiodude I restored the 2-space indentation. Really sorry for the delay.
Kindly review.

Copy link
Member

@audiodude audiodude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but please install the pre-commit hooks (https://github.com/openzim/wp1?tab=readme-ov-file#pre-commit-hooks) or otherwise run yapf to fix the spacing. I see lots of unrelated whitespace changes in this PR based on what looks like manually reducing spacing to two spaces even when the formatter inserts more.

EDIT: fixed wrong link

@audiodude
Copy link
Member

audiodude commented Mar 23, 2025

Hi @harsh-0409:

I'm going to close this PR in favor of #815. No offense to you, this is a decent approach, but I feel the other PR is more sound overall.

As for GSoC, I consider that you are still eligible having gone through this process, even though your PR wasn't merged. To be clear, you have submitted (for review) a well thought out PR that solves a specific bug, and provided good and thoughtful communication. I consider the Kiwix PR requirement satisfied and will advocate for you as such.

EDIT: Remember, GSoC participants are selected based on the strength of their project proposals, not their success in merging PRs in the pre-period.

@audiodude audiodude closed this Mar 23, 2025
@harsh-0409
Copy link
Author

Hi @audiodude :
I reviewed #815, and I have to say, the work done is truly impressive! Honestly, if you had accepted my PR, I might have been a little offended (just kidding!).

Regarding GSoC, I’m particularly interested in the idea "Project B, Scheduled selections/ZIMs" from the idea list. I’d love to explore this further—what’s the best way to get in touch with you to discuss it?

@audiodude
Copy link
Member

Regarding GSoC, I’m particularly interested in the idea "Project B, Scheduled selections/ZIMs" from the idea list. I’d love to explore this further—what’s the best way to get in touch with you to discuss it?

Mention me in the Kiwix #gsoc channel on Slack and we can discuss there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants