-
Notifications
You must be signed in to change notification settings - Fork 16
REP-6882 Track mismatch duration #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
autarch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very big PR, and I'm really struggling to follow it.
I think this would be a lot easier to review if split this PR up a bit. I think there are a few things it could be broken into:
- Add the aggregation stuff and use it.
- Rename "mismatch" -> "problem".
- Add the mismatch time stuff.
- Maybe separate out the changes in output too, but if 2 and 3 were split up, this may not be a big deal.
I'm not sure if it makes sense to split out 1, since I don't think you want to write new aggregations by hand and then use the new aggregation code.
In particular, combining 2 & 3 makes this much harder to review, since there's a lot of places where the name of a thing changes and/or there's a logic change. And it's hard to see whether there is a logic change.
This would be a good use of Graphite. I'd note that you could probably make full use of Graphite and merge stacks all together, since this repo doesn't use Evergreen.
Anyway, overall, this looks reasonable, inasmuch as I can follow it. Is there anyone else on the team who knows this code better than me? Maybe they can take a look? If not, I really think this needs to be broken up.
| return problems, docCount, byteCount, err | ||
| } | ||
|
|
||
| func (verifier *Verifier) compareDocsFromChannels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very hard to follow. It's a massive method that mostly consists of closures. It would be great to break this up. I know you didn't write this originally, but you are adding yet another closure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t write the tool itself, but the code in question probably is, in fact, my work. (In my defense, this tool always seems to be “maintenance-only … until uh-oh it now needs to be better.”)
I split the new stuff out into a new struct, in a separate file. I think a broad-scale refactor would help, but it’d be more ideal as a follow-up.
internal/verifier/compare.go
Outdated
| var idToMismatch map[string]recheck.MismatchTimes | ||
| getPriorMismatchTimes := func(doc bson.Raw) option.Option[recheck.MismatchTimes] { | ||
| if idToMismatch == nil { | ||
| idToMismatch = createIdToMismatchTimes(task) | ||
| } | ||
|
|
||
| mapKeyBytes := rvToMapKey( | ||
| nil, | ||
| getDocIdFromComparison(verifier.docCompareMethod, doc), | ||
| ) | ||
|
|
||
| return option.IfNotZero(idToMismatch[string(mapKeyBytes)]) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a closure? It seems like you could make this a separate func without too much trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a separate struct now.
internal/verifier/compare.go
Outdated
| pool.Put(srcDoc.doc) | ||
| pool.Put(dstDoc.doc) | ||
| var mismatchTimes option.Option[recheck.MismatchTimes] | ||
| if len(curProblems) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if len(curProblems) == 0 you could just short-circuit and return nil instead of checking it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
internal/verifier/summary.go
Outdated
| // TODO | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to add or delete something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to delete the comment. :-P fixed
2ba4f93 to
0f5079a
Compare
2cb8576 to
4ac6c1e
Compare
|
@autarch Thank you for your notes. I’ve done what I think I reasonably can here to derisk the review:
… but it’s still a large changeset, regrettably. (I hoped also to separate the logging changes, but there didn’t seem a viable “middle ground”.) I did make other changes since you looked:
|
Verifier historically considered all mismatches coequally significant. For logging this is unhelpful: ideally users should see those mismatches that are recurring in successive generations. This changeset takes a step toward that by tracking how long a given mismatch has appeared in Verifier checks and surfacing the “longest-lived” mismatches in the log.
To do this, two pieces of information are added when tracking a mismatch:
These get saved into both the mismatch (for logging purposes) and the recheck entry.
When Verifier creates a new task from recheck entries, that task also contains the relevant document’s mismatch times (first-seen time & duration). If any non-mismatch rechecks exist for the document, the mismatch times are omitted from the task. This is because those non-mismatch rechecks will have come from change events, which means the document has changed, so any further mismatch will be a “new” one. (In this case, of course, hopefully the change event means the mismatch is now fixed!)
If, on recheck, the document is still mismatched, the mismatch’s duration is recomputed (i.e., the time since the mismatch’s first-seen time), and that new duration is saved with the new recheck entry. (If there’s no mismatch, then there’s no recheck.)
Finally: when logging mismatches, Verifier now also shows the mismatch’s duration and sorts the displayed mismatches longest-duration-first. To offset this additional “noise” in the logs, Verifier no longer shows destination namespaces in document mismatch logs.
This changeset also includes some tangential-but-opportune logging improvements:
Ancillary, implementation-level changes: