-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate a proper diff-like report, and a patch tool #2
Comments
I tried a slightly different approach in my PhD dissertation -- I developed a Java tool that created an XML file that stored a list of a series of taxonomic checklists, each of which included:
The tool would diff the list of species between the checklists, and then tell the user about unannotated changes. The user could then annotate those changes -- by classifying them as a "split", "lump", "split/lump", "rename", "addition" or "deletion", I think -- and record cases where, say, a single species is split into multiple species, or a subspecies is moved into another species (a "split/lump"), or what have you. The tool can then confirm that all changes have been annotated in one way or another. This format is probably unnecessarily comprehensive for a diff-like report, but I just wanted to mention it here in case it gave you any ideas! |
Cool. I was just musing about the idea of second derivatives - differences between difference sets - which sounds like what you describe. I'm not sure how iterative alignment development will work out in practice; I did it in an ad hoc way for open tree and it was a mess. I'll look at what you've written about this. The idea of checking whether changes have been annotated is a good one - with open tree I just used the issues report as a to-do list, and this became unworkable as so many issues were really non-issues but I had no way to annotate them as such. Because all these reconciliation tools (open tree, GBIF, EOL, etc.) are at bottom attacking the same problem, I bet the same techniques have been reinvented repeatedly. The hard part is figuring out how to do this in a reusable way so that if A is like A' then an alignment of A to B can be transformed into an alignment of A' to B as easily as possible. If there were a way to talk about "the same change" in both A->B and A'->B that would be a step forward, since then annotations of changes could be reused in different alignment tasks. |
That means that if all children of a node are unchanged, the node is completely unchanged, and the node and its children shouldn't be displayed.
By the same token, proof that the diff is logically meaningful should be obtained by writing a 'patch' tool that combines one checklist with a diff report to produce the other checklist.
The text was updated successfully, but these errors were encountered: