(DOCSP-49695): Handle moved pages #54
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Moved pages
There are two scenarios where the same code examples can show up as "new" and "removed" when they're really just moved:
After consideration, this PR only handles the first scenario - when a page is moved in its entirety. That should cover the majority of cases. Tracking moved partial content would potentially be a much bigger performance hit, because we would have to compare all "new" and all "removed" code examples in a docs set, versus tracking new and removed pages. The quantity of comparisons and comparison candidates would potentially be much higher.
For now, let's make this change and see what happens. We can consider attempting to track moved code examples as a future unit of work if we find we continue to see a lot of duplicated "new" and "removed" content.
Duplicate code examples
My work in the recent PR #48 has introduced a new case where we are not writing duplicate code examples to the
nodes
array, and that has caused a discrepancy between counts. I've added a newInstancesOnPage
field which gets a value if a code example appears more than once on a page. Otherwise, it is omitted. Go initializes omitted int fields with a 0 value. I've incorporated theInstancesOnPage
numbers into the various code example counts.I am now correctly matching code example counts on an individual page level, considering the duplicate code examples on the page. However, I am still missing some code example counts on the project level, as demonstrated by this log message:
Because the counts are correct on the page level, and I am writing the number that I get from Snooty to the summaries document as the count at the project level, I believe that all the counts I'm recording in the database are correct. I believe the discrepancy here is just in my internal logic that I use to try to catch issues with the counts.
I've spent basically all day debugging this, and it doesn't seem like a good use of my time to carry on, so I'm putting this up as-is for now.