Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate a proper diff-like report, and a patch tool #2

Open
jar398 opened this issue Apr 28, 2020 · 2 comments
Open

Generate a proper diff-like report, and a patch tool #2

jar398 opened this issue Apr 28, 2020 · 2 comments

Comments

@jar398
Copy link
Owner

jar398 commented Apr 28, 2020

That means that if all children of a node are unchanged, the node is completely unchanged, and the node and its children shouldn't be displayed.

By the same token, proof that the diff is logically meaningful should be obtained by writing a 'patch' tool that combines one checklist with a diff report to produce the other checklist.

@gaurav
Copy link

gaurav commented Oct 23, 2020

I tried a slightly different approach in my PhD dissertation -- I developed a Java tool that created an XML file that stored a list of a series of taxonomic checklists, each of which included:

  • A list of all the species (and subspecies) recognized in that checklist, and
  • A list of changes annotated between that checklist and the previous one

The tool would diff the list of species between the checklists, and then tell the user about unannotated changes. The user could then annotate those changes -- by classifying them as a "split", "lump", "split/lump", "rename", "addition" or "deletion", I think -- and record cases where, say, a single species is split into multiple species, or a subspecies is moved into another species (a "split/lump"), or what have you. The tool can then confirm that all changes have been annotated in one way or another.

This format is probably unnecessarily comprehensive for a diff-like report, but I just wanted to mention it here in case it gave you any ideas!

@jar398
Copy link
Owner Author

jar398 commented Oct 23, 2020

Cool. I was just musing about the idea of second derivatives - differences between difference sets - which sounds like what you describe. I'm not sure how iterative alignment development will work out in practice; I did it in an ad hoc way for open tree and it was a mess. I'll look at what you've written about this. The idea of checking whether changes have been annotated is a good one - with open tree I just used the issues report as a to-do list, and this became unworkable as so many issues were really non-issues but I had no way to annotate them as such.

Because all these reconciliation tools (open tree, GBIF, EOL, etc.) are at bottom attacking the same problem, I bet the same techniques have been reinvented repeatedly. The hard part is figuring out how to do this in a reusable way so that if A is like A' then an alignment of A to B can be transformed into an alignment of A' to B as easily as possible. If there were a way to talk about "the same change" in both A->B and A'->B that would be a step forward, since then annotations of changes could be reused in different alignment tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants