-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IQSS/11242 ExternalIdentifier fix #11246
base: develop
Are you sure you want to change the base?
IQSS/11242 ExternalIdentifier fix #11246
Conversation
@@ -4,7 +4,7 @@ | |||
import java.util.regex.Pattern; | |||
|
|||
public enum ExternalIdentifier { | |||
ORCID("ORCID", "https://orcid.org/%s", "^\\d{4}-\\d{4}-\\d{4}-(\\d{4}|\\d{3}X)$"), | |||
ORCID("ORCID", "https://orcid.org/%s", "^(https:\\/\\/orcid.org\\/)?\\d{4}-\\d{4}-\\d{4}-(\\d{4}|\\d{3}X)$"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of b9d0146 we say the following:
When entering author identifiers, select the type from the dropdown (e.g. "ORCID") and under "Identifier" enter just the unique identifier (e.g. "0000-0002-1825-0097") rather than the full URL (e.g. "https://orcid.org/0000-0002-1825-0097").
Should this be re-written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code will support both forms, but if you aren't using the external vocab scripts, the short form may still be preferable? That's why I didn't change it, but happy to change it if we want to allow a choice for manual entry. (AFAIK, there is no validation for input so people can still enter whatever they want.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey all. @pdurbin pinged me about this.
The short form, or the non-URL form, is preferable for ORCIDs that users type in or paste in the Author Identifier field, because that's what makes the link clickable, right? So the same will hold true for RORs that users type in or paste in the Author Identifier field, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The earlier fix changed ROR so the short form would be a link. This PR makes it so that either form of ORCID or ROR should result in a valid link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. Also, when you wrote "happy to change it if we want to allow a choice for manual entry", I thought we can already allow a choice for manual entry of identifiers. That's how it's set up on Demo Dataverse and planned for Harvard Dataverse.
Or did you mean when installations configure an external vocab script to allow for manual entry, that is that users can enter the identifier in the Identifier field that appears when they enter their own name instead of selecting one from the suggestions?
About the user guide text being added in b9d0146, it's written to apply to all identifiers, so I'm testing on Demo Dataverse to review how the other identifiers that users enter in the Author Identifier field are displayed, especially when the identifier has a URL form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code so that all identifiers will handle both URL and short form (when no scripts are running).
I expanded the tests so that all identifier types are actually checked and verified that both the URL and short forms get format()ed to the URL form (which is specifically relevant to this issue)
I also fixed the ISNI matcher which didn't support ISNIs ending in X instead of a digit.
I've left the text alone.
I'll look for the issue in the ext-vocab repo - guessing we'll need to have similar regex detection to decide when to turn the values into links.
With that, I think this is ready to continue forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! So "The exception is the ISNI identifier type; it's never displayed as a link" isn't true. It is displayed as a link for ISNIs that end in digits, but not for ISNIs that end in X. Thanks for looking into that and for fixing it!
I'm not sure what you mean by "all identifiers will handle both URL and short form". Does that mean that for an identifier that has a URL form and a short form, when a user enters either form into the Author Identifier field, it's displayed as a link on the dataset page and it's included in DataCite exports and sent to DataCite (for repositories that register DOIs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created that GitHub issue in the ext-vocab repo. It's at gdcc/dataverse-external-vocab-support#33. There are some details I couldn't be sure of because:
- I'm testing on Demo Dataverse and the change won't happen on Demo Dataverse's Author fields until v6.6 is released and applied to Demo Dataverse, like how RORs are displayed on the dataset page
- the Author fields on Demo Dataverse are behaving in ways I didn't expect, like how the dataset page displays an ORCID logo even when the user enters the ORCID by using the Author Identifier Type and Author Identifier fields
- I'm not sure about what changes were made so that the dataset page displays IDs as links when users enter IDs in Author fields where the cvoc script isn't applied. This is what I'm trying to clarify in my last comment.
I can help update that GitHub issue as I understand more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the identifier types have both a short and URI form (FWIW DAI was not handled before - I've added it). Entering either as the author identifier will cause it to display as a link in the metadata display (except DAI which is not a URL). The DataCite XML generator adds the identifier, in URI form, regardless of the identifier type, so yes they are all included regardless of the short or URI input form (including DAI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay thanks @qqmyers!
So @pdurbin, so should we rewrite what's written in the User Guides, at b9d0146, about how to enter identifiers in the Author Identifier field?
Or maybe that addition can be removed, since identifiers that Dataverse can display as link will be displayed as links when the cvoc script is used and when it's not used, and when users enter the short forms of the IDs and when they enter the URL forms. If all of the ways that we expect users to enter identifiers in that field will be supported - displayed as links and included in the expected metadata exports - then we don't need to guidance about how to enter identifiers in that field, right?
This PR is "Ready for review" but should it be moved back to "In progress" until we agree on what to do about that addition to the User Guides?
What this PR does / why we need it: This PR fixes a bug in the ExternalIdentifier class that caused ORCIDs in URL form to not be recognized, breaking part of the external vocab functionality w.r.t. ORCID.
Which issue(s) this PR closes:
Special notes for your reviewer: As the fix is generic, I reverted changes in the ROR recognition to avoid having two separate versions. Also added tests for formatting or ORCID and ROR.
Also - I noted that it looks like some tests in DatasetFieldValueValidatorTest duplicate those in ExternalIdentifier. If a reviewer confirms that, I'd be happy to remove them (and the isValidAuthorIdentifier() method that is only used in those tests).
Suggestions on how to test this: Verify that publishing a dataset with an ORCID in numeric form or URL form as an author ID, the ORCID appears in the DataCite XML. Verify that either form of ORCID/ROR show up as valid links in the page display (without external vocab scripts turned on).
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: included.
Additional documentation: