-
Notifications
You must be signed in to change notification settings - Fork 241
Validate property names #7284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate property names #7284
Conversation
| expand!( | ||
| ( | ||
| icu::properties::props::CanonicalCombiningClass, | ||
| PropertyEnumCanonicalCombiningClassV1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing something like icu::properties::props::CanonicalCombiningClass::DataMarker here, but unfortunately that implementation is not allowed as the associated type is from a different crate
|
/gemini review It seems fine so I'm happy to stamp it, but I didn't take the time to learn the macro :). I think @Manishearth knows this code better than me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a large refactoring to validate Unicode property names against the data files. It introduces validation logic in the data providers and updates property name definitions to match the standard. The changes improve code quality, robustness, and type safety. I have one suggestion for improving robustness in the data generation code.
Manishearth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
I found this PR a bit tedious to review: I didn't know where the actual important changes were vs changes downstream of that. It would be nice to have some context in PR bodies or a comment to help review.
| make_binary_property! { | ||
| name: "Alnum"; | ||
| short_name: "Alnum"; | ||
| name: "alnum"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observation: these don't follow the property naming convention, but that might be because they're compat properties defined by the regex spec.
https://www.unicode.org/reports/tr18/#Compatibility_Properties
nit: please document the source to use for these names on the macro. There are multiple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what's in icuexportdata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't trust those names, icuexportdata is a hodgepodge. We should follow and check against something concrete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: I'm not saying we should download the spec or UCD and check against it, I'm saying our code source of truth should be something that is not "this makes the tests pass", and if the tests fail we can see if things need to be fixed in icuexportdata.
Otherwise I don't really see the point of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't trust those names, icuexportdata is a hodgepodge
afaict icuexportdata uses the names that ICU4C uses. If not, that's a bug, but from the changes I made here it seems that they're all correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well neither of those defines short names. It would take significant extra work to chase down all the standards and compatible libraries of these names. You have already approved this PR, so I assume this is not blocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this isn't blocking, but I also don't consider this PR to really be an improvement without that. Otherwise we're just shuffling things around, since Unicode is not very consistent about these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this isn't blocking, but I also don't consider this PR to really be an improvement without that. Otherwise we're just shuffling things around, since Unicode is not very consistent about these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I've seen Unicode is very consistent, I haven't seen alternative spellings anywhere expect for here.
This actually fixes new_for_ecma262, because that actually documents it uses the names from https://tc39.es/ecma262/#table-binary-unicode-properties, but some of them were not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like icuexportdata to be something we consider trustworthy.
|
|
||
|
|
||
| $prop_name:literal)),+,) => { | ||
| macro_rules! expand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: please document this macro (specifically, the convert stuff). It's way harder to review changes to a complex macro when you don't know what it's supposed to do.
(Yes, I know it's previously undocumented)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trust? there's no data diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's fine. I would prefer if macros gained documentation over time especially when they have weird input syntax.
| let bpt_trie = CodePointTrie::try_from(bpt_data).map_err(|e| { | ||
| DataError::custom("Could not parse CodePointTrie TOML").with_display_context(&e) | ||
| })?; | ||
| let bidi_m_cpinvlist = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observation: this diff mostly looks like a cleanup, but it also switches get_binary_prop_for_code_point_set/etc over to accepting multiple names
| .ok_or_else(|| DataErrorKind::MarkerNotFound.into_error()) | ||
| .ok_or_else(|| DataErrorKind::MarkerNotFound.into_error())?; | ||
|
|
||
| if name != data.long_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observation: this is where the actual check happens
| } | ||
|
|
||
| impl super::uprops_serde::binary::BinaryProperty { | ||
| pub(crate) fn build_inversion_list(&self) -> CodePointInversionList<'static> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: good refactor
| (PropertyBinaryXdigitV1, "xdigit"), | ||
| (PropertyBinaryXidContinueV1, "XIDC"), | ||
| (PropertyBinaryXidStartV1, "XIDS") | ||
| ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: can we rustfmt::skip this to avoid wrapping?
alternatively perhaps the syntax of expand can be turned into something like PropType => PropMarker. Or we can have it imply icu::properties::props::. Or something.
I should have called this out I guess. If you agree with the changes in |
|
Merging because I want to pick this up for the patch release |
No description provided.