Skip to content

(DOCSP-49695): Handle moved pages #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

dacharyc
Copy link
Collaborator

@dacharyc dacharyc commented Jun 25, 2025

Moved pages

There are two scenarios where the same code examples can show up as "new" and "removed" when they're really just moved:

  • When a page is moved, the page ID doesn't match the old one, so we delete the old page and create a new page. Really, though, they're the same page.
  • When a section of a page is moved out to a new page, some of the page content is split to a new ID. Code examples that are removed from the old page would be counted as "removed", and code examples on the newly-created page would be considered "new" - when they're really just the same examples.

After consideration, this PR only handles the first scenario - when a page is moved in its entirety. That should cover the majority of cases. Tracking moved partial content would potentially be a much bigger performance hit, because we would have to compare all "new" and all "removed" code examples in a docs set, versus tracking new and removed pages. The quantity of comparisons and comparison candidates would potentially be much higher.

For now, let's make this change and see what happens. We can consider attempting to track moved code examples as a future unit of work if we find we continue to see a lot of duplicated "new" and "removed" content.

Duplicate code examples

My work in the recent PR #48 has introduced a new case where we are not writing duplicate code examples to the nodes array, and that has caused a discrepancy between counts. I've added a new InstancesOnPage field which gets a value if a code example appears more than once on a page. Otherwise, it is omitted. Go initializes omitted int fields with a 0 value. I've incorporated the InstancesOnPage numbers into the various code example counts.

I am now correctly matching code example counts on an individual page level, considering the duplicate code examples on the page. However, I am still missing some code example counts on the project level, as demonstrated by this log message:

2025/06/26 17:03:29 Code node count issue: Project pymongo: expected 847 code nodes, got 870

Because the counts are correct on the page level, and I am writing the number that I get from Snooty to the summaries document as the count at the project level, I believe that all the counts I'm recording in the database are correct. I believe the discrepancy here is just in my internal logic that I use to try to catch issues with the counts.

I've spent basically all day debugging this, and it doesn't seem like a good use of my time to carry on, so I'm putting this up as-is for now.

Copy link

netlify bot commented Jun 25, 2025

Deploy Preview for ask-cal canceled.

Name Link
🔨 Latest commit 34f60e7
🔍 Latest deploy log https://app.netlify.com/projects/ask-cal/deploys/685db7983df18a0008d9ad96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant