Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion components/properties/src/bidi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ impl EnumeratedProperty for BidiMirroringGlyph {
const SINGLETON: &'static crate::provider::PropertyCodePointMap<'static, Self> =
crate::provider::Baked::SINGLETON_PROPERTY_ENUM_BIDI_MIRRORING_GLYPH_V1;
const NAME: &'static [u8] = b"Bidi_Mirroring_Glyph";
const SHORT_NAME: &'static [u8] = b"Bidi_Mirroring_Glyph";
const SHORT_NAME: &'static [u8] = b"bmg";
}

impl crate::private::Sealed for BidiMirroringGlyph {}
Expand Down
4 changes: 4 additions & 0 deletions components/properties/src/emoji.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,4 +168,8 @@ pub trait EmojiSet: crate::private::Sealed {
#[doc(hidden)]
#[cfg(feature = "compiled_data")]
const SINGLETON: &'static PropertyUnicodeSet<'static>;
/// The name of this property
const NAME: &'static [u8];
/// The abbreviated name of this property, if it exists, otherwise the name
const SHORT_NAME: &'static [u8];
}
48 changes: 27 additions & 21 deletions components/properties/src/props.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1904,8 +1904,8 @@ make_binary_property! {
}

make_binary_property! {
name: "Alnum";
short_name: "Alnum";
name: "alnum";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: these don't follow the property naming convention, but that might be because they're compat properties defined by the regex spec.

https://www.unicode.org/reports/tr18/#Compatibility_Properties

nit: please document the source to use for these names on the macro. There are multiple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what's in icuexportdata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't trust those names, icuexportdata is a hodgepodge. We should follow and check against something concrete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I'm not saying we should download the spec or UCD and check against it, I'm saying our code source of truth should be something that is not "this makes the tests pass", and if the tests fail we can see if things need to be fixed in icuexportdata.

Otherwise I don't really see the point of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't trust those names, icuexportdata is a hodgepodge

afaict icuexportdata uses the names that ICU4C uses. If not, that's a bug, but from the changes I made here it seems that they're all correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well neither of those defines short names. It would take significant extra work to chase down all the standards and compatible libraries of these names. You have already approved this PR, so I assume this is not blocking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this isn't blocking, but I also don't consider this PR to really be an improvement without that. Otherwise we're just shuffling things around, since Unicode is not very consistent about these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this isn't blocking, but I also don't consider this PR to really be an improvement without that. Otherwise we're just shuffling things around, since Unicode is not very consistent about these.

Copy link
Member Author

@robertbastian robertbastian Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen Unicode is very consistent, I haven't seen alternative spellings anywhere expect for here.

This actually fixes new_for_ecma262, because that actually documents it uses the names from https://tc39.es/ecma262/#table-binary-unicode-properties, but some of them were not correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like icuexportdata to be something we consider trustworthy.

short_name: "alnum";
ident: Alnum;
data_marker: crate::provider::PropertyBinaryAlnumV1;
singleton: SINGLETON_PROPERTY_BINARY_ALNUM_V1;
Expand Down Expand Up @@ -1986,8 +1986,8 @@ make_binary_property! {
}

make_binary_property! {
name: "Blank";
short_name: "Blank";
name: "blank";
short_name: "blank";
ident: Blank;
data_marker: crate::provider::PropertyBinaryBlankV1;
singleton: SINGLETON_PROPERTY_BINARY_BLANK_V1;
Expand Down Expand Up @@ -2430,8 +2430,8 @@ make_binary_property! {
}

make_binary_property! {
name: "Graph";
short_name: "Graph";
name: "graph";
short_name: "graph";
ident: Graph;
data_marker: crate::provider::PropertyBinaryGraphV1;
singleton: SINGLETON_PROPERTY_BINARY_GRAPH_V1;
Expand Down Expand Up @@ -2562,7 +2562,7 @@ make_binary_property! {
}

make_binary_property! {
name: "Id_Continue";
name: "ID_Continue";
short_name: "IDC";
ident: IdContinue;
data_marker: crate::provider::PropertyBinaryIdContinueV1;
Expand Down Expand Up @@ -2614,7 +2614,7 @@ make_binary_property! {
}

make_binary_property! {
name: "Id_Start";
name: "ID_Start";
short_name: "IDS";
ident: IdStart;
data_marker: crate::provider::PropertyBinaryIdStartV1;
Expand Down Expand Up @@ -2643,7 +2643,7 @@ make_binary_property! {
}

make_binary_property! {
name: "Ids_Binary_Operator";
name: "IDS_Binary_Operator";
short_name: "IDSB";
ident: IdsBinaryOperator;
data_marker: crate::provider::PropertyBinaryIdsBinaryOperatorV1;
Expand All @@ -2664,7 +2664,7 @@ make_binary_property! {
}

make_binary_property! {
name: "Ids_Trinary_Operator";
name: "IDS_Trinary_Operator";
short_name: "IDST";
ident: IdsTrinaryOperator;
data_marker: crate::provider::PropertyBinaryIdsTrinaryOperatorV1;
Expand Down Expand Up @@ -2819,7 +2819,7 @@ make_binary_property! {

make_binary_property! {
name: "NFC_Inert";
short_name: "NFC_Inert";
short_name: "nfcinert";
ident: NfcInert;
data_marker: crate::provider::PropertyBinaryNfcInertV1;
singleton: SINGLETON_PROPERTY_BINARY_NFC_INERT_V1;
Expand All @@ -2828,7 +2828,7 @@ make_binary_property! {

make_binary_property! {
name: "NFD_Inert";
short_name: "NFD_Inert";
short_name: "nfdinert";
ident: NfdInert;
data_marker: crate::provider::PropertyBinaryNfdInertV1;
singleton: SINGLETON_PROPERTY_BINARY_NFD_INERT_V1;
Expand All @@ -2837,7 +2837,7 @@ make_binary_property! {

make_binary_property! {
name: "NFKC_Inert";
short_name: "NFKC_Inert";
short_name: "nfkcinert";
ident: NfkcInert;
data_marker: crate::provider::PropertyBinaryNfkcInertV1;
singleton: SINGLETON_PROPERTY_BINARY_NFKC_INERT_V1;
Expand All @@ -2846,7 +2846,7 @@ make_binary_property! {

make_binary_property! {
name: "NFKD_Inert";
short_name: "NFKD_Inert";
short_name: "nfkdinert";
ident: NfkdInert;
data_marker: crate::provider::PropertyBinaryNfkdInertV1;
singleton: SINGLETON_PROPERTY_BINARY_NFKD_INERT_V1;
Expand Down Expand Up @@ -2917,8 +2917,8 @@ make_binary_property! {
}

make_binary_property! {
name: "Print";
short_name: "Print";
name: "print";
short_name: "print";
ident: Print;
data_marker: crate::provider::PropertyBinaryPrintV1;
singleton: SINGLETON_PROPERTY_BINARY_PRINT_V1;
Expand Down Expand Up @@ -3018,7 +3018,7 @@ make_binary_property! {

make_binary_property! {
name: "Segment_Starter";
short_name: "Segment_Starter";
short_name: "segstart";
ident: SegmentStarter;
data_marker: crate::provider::PropertyBinarySegmentStarterV1;
singleton: SINGLETON_PROPERTY_BINARY_SEGMENT_STARTER_V1;
Expand All @@ -3028,7 +3028,7 @@ make_binary_property! {

make_binary_property! {
name: "Case_Sensitive";
short_name: "Case_Sensitive";
short_name: "Sensitive";
ident: CaseSensitive;
data_marker: crate::provider::PropertyBinaryCaseSensitiveV1;
singleton: SINGLETON_PROPERTY_BINARY_CASE_SENSITIVE_V1;
Expand Down Expand Up @@ -3153,7 +3153,7 @@ make_binary_property! {

make_binary_property! {
name: "White_Space";
short_name: "space";
short_name: "WSpace";
ident: WhiteSpace;
data_marker: crate::provider::PropertyBinaryWhiteSpaceV1;
singleton: SINGLETON_PROPERTY_BINARY_WHITE_SPACE_V1;
Expand All @@ -3176,8 +3176,8 @@ make_binary_property! {
}

make_binary_property! {
name: "Xdigit";
short_name: "Xdigit";
name: "xdigit";
short_name: "xdigit";
ident: Xdigit;
data_marker: crate::provider::PropertyBinaryXdigitV1;
singleton: SINGLETON_PROPERTY_BINARY_XDIGIT_V1;
Expand Down Expand Up @@ -3247,6 +3247,8 @@ pub use crate::emoji::EmojiSet;

macro_rules! make_emoji_set {
(
name: $name:literal;
short_name: $short_name:literal;
ident: $ident:ident;
data_marker: $data_marker:ty;
singleton: $singleton:ident;
Expand All @@ -3264,11 +3266,15 @@ macro_rules! make_emoji_set {
#[cfg(feature = "compiled_data")]
const SINGLETON: &'static crate::provider::PropertyUnicodeSet<'static> =
&crate::provider::Baked::$singleton;
const NAME: &'static [u8] = $name.as_bytes();
const SHORT_NAME: &'static [u8] = $short_name.as_bytes();
}
}
}

make_emoji_set! {
name: "Basic_Emoji";
short_name: "Basic_Emoji";
ident: BasicEmoji;
data_marker: crate::provider::PropertyBinaryBasicEmojiV1;
singleton: SINGLETON_PROPERTY_BINARY_BASIC_EMOJI_V1;
Expand Down
70 changes: 23 additions & 47 deletions provider/source/src/properties/bidi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,6 @@ use crate::SourceDataProvider;
use icu::properties::provider::PropertyEnumBidiMirroringGlyphV1;
use icu_provider::prelude::*;

#[cfg(any(feature = "use_wasm", feature = "use_icu4c"))]
impl SourceDataProvider {
fn get_code_point_prop_map<'a>(
&'a self,
key: &str,
) -> Result<&'a super::uprops_serde::code_point_prop::CodePointPropertyMap, DataError> {
self.icuexport()?
.read_and_parse_toml::<super::uprops_serde::code_point_prop::Main>(&format!(
"uprops/{}/{}.toml",
self.trie_type(),
key
))?
.enum_property
.first()
.ok_or_else(|| DataErrorKind::MarkerNotFound.into_error())
}
}

// implement data provider 2 different ways, based on whether or not
// features exist that enable the use of CPT Builder (ex: `use_wasm` or `use_icu4c`)
impl DataProvider<PropertyEnumBidiMirroringGlyphV1> for SourceDataProvider {
Expand All @@ -34,46 +16,40 @@ impl DataProvider<PropertyEnumBidiMirroringGlyphV1> for SourceDataProvider {
&self,
req: DataRequest,
) -> Result<DataResponse<PropertyEnumBidiMirroringGlyphV1>, DataError> {
use icu::collections::codepointinvlist::CodePointInversionListBuilder;
use icu::collections::codepointtrie::CodePointTrie;
use icu::collections::codepointtrie::TrieType;
use icu::collections::codepointtrie::TrieValue;
use icu::properties::props::BidiMirroringGlyph;
use icu::properties::props::BidiPairedBracketType;
use icu::properties::props::EnumeratedProperty;
use icu_codepointtrie_builder::{CodePointTrieBuilder, CodePointTrieBuilderData};

self.check_req::<PropertyEnumBidiMirroringGlyphV1>(req)?;

// Bidi_M / Bidi_Mirrored
let bidi_m_data = self.get_binary_prop_for_code_point_set("Bidi_M")?;
let mut bidi_m_builder = CodePointInversionListBuilder::new();
for (start, end) in &bidi_m_data.ranges {
bidi_m_builder.add_range32(start..=end);
}
let bidi_m_cpinvlist = bidi_m_builder.build();

// bmg / Bidi_Mirroring_Glyph
let bmg_data = &self.get_code_point_prop_map("bmg")?.code_point_trie;
let bmg_trie = CodePointTrie::try_from(bmg_data).map_err(|e| {
DataError::custom("Could not parse CodePointTrie TOML").with_display_context(&e)
})?;

// bpt / Bidi_Paired_Bracket_Type
let bpt_data = &self.get_enumerated_prop("bpt")?.code_point_trie;
let bpt_trie = CodePointTrie::try_from(bpt_data).map_err(|e| {
DataError::custom("Could not parse CodePointTrie TOML").with_display_context(&e)
})?;
let bidi_m_cpinvlist = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: this diff mostly looks like a cleanup, but it also switches get_binary_prop_for_code_point_set/etc over to accepting multiple names

.get_binary_prop_for_code_point_set("Bidi_Mirrored", "Bidi_M")?
.build_inversion_list();

let bmg_trie = self
.get_enumerated_prop(
core::str::from_utf8(BidiMirroringGlyph::NAME).unwrap(),
core::str::from_utf8(BidiMirroringGlyph::SHORT_NAME).unwrap(),
)?
.build_codepointtrie()?;

let bpt = self.get_enumerated_prop("Bidi_Paired_Bracket_Type", "bpt")?;
let bpt_trie = bpt.build_codepointtrie::<u16>()?;
let bpt_lookup = bpt.values_to_names_long();

let trie_vals = (0..=(char::MAX as u32)).map(|cp| {
let mut r = BidiMirroringGlyph::default();
r.mirrored = bidi_m_cpinvlist.contains32(cp);
r.mirroring_glyph = r
.mirrored
.then_some(bmg_trie.get32(cp))
.filter(|&cp| cp as u32 != 0);
r.paired_bracket_type = match bpt_trie.get32(cp) {
1 => BidiPairedBracketType::Open,
2 => BidiPairedBracketType::Close,
if !bidi_m_cpinvlist.contains32(cp) {
return r;
}
r.mirrored = true;
r.mirroring_glyph = Some(bmg_trie.get32(cp)).filter(|&cp| cp as u32 != 0);
r.paired_bracket_type = match bpt_lookup[&(bpt_trie.get32(cp))] {
"Open" => BidiPairedBracketType::Open,
"Close" => BidiPairedBracketType::Close,
_ => BidiPairedBracketType::None,
};
if r.mirrored && r.mirroring_glyph.is_none() {
Expand Down
Loading