Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markup5ever non-deterministic generated.rs #573

Closed
bmwiedemann opened this issue Feb 12, 2025 · 7 comments · Fixed by #577
Closed

markup5ever non-deterministic generated.rs #573

bmwiedemann opened this issue Feb 12, 2025 · 7 comments · Fixed by #577

Comments

@bmwiedemann
Copy link

Originally filed in lycheeverse/lychee#1632 - check out the details there.

Please ensure that markup5ever's generated.rs is created in a deterministic way, to allow for reproducible builds.

@Ygg01
Copy link
Contributor

Ygg01 commented Feb 16, 2025

This will be a bit more tricky than advertised.

What seems to be the source of non-determinism is the phf turning https://github.com/servo/html5ever/blob/main/markup5ever/entities.rs into an perfect hash map.

named_entities_to_phf(&Path::new(&env::var("OUT_DIR").unwrap()).join("named_entities.rs"));

The problem I see is, I don't know where the non-determinism is coming from. It could be as simple as changing HashMap into TreeMap:

let mut entities: HashMap<&str, (u32, u32)> = entities::NAMED_ENTITIES
.iter()
.map(|(name, cp1, cp2)| {
assert!(name.starts_with('&'));
(&name[1..], (*cp1, *cp2))
})
.collect();

or as hard as getting phf or string_cache_codegen to emit stable perfect hash map.

@bmwiedemann
Copy link
Author

Rust HashMaps have random iteration order. osa1/tiny#438 solved that with an indexmap. Though it would be nice, if rust had some builtin way for that.

@nicoburns
Copy link
Contributor

Rust HashMaps have random iteration order

Yeah, there's a strong chance that it's that HashMap causing this non-determinism.

though it would be nice, if rust had some builtin way for that.

There is std::collectionsBTreeMap would likely work well in this case.

@Ygg01
Copy link
Contributor

Ygg01 commented Feb 16, 2025

that HashMap causing this non-determinism.

The big thing is how do we repro this non-determinism? Without a test, it's going to be hard to not have a regression somewhere along the line.

@bmwiedemann
Copy link
Author

In openSUSE, I build our packages with

osc checkout openSUSE:Factory/lowfi && cd $_
for N in 1 2 ; do
  osc build --debuginfo --vm-type=kvm --clean --noservice --keep-pkgs=RPMS.$N --release=1.1 standard
done

but it will probably be easier for you with some cargo build of https://github.com/talwat/lowfi or slower https://github.com/lycheeverse/lychee

@bmwiedemann
Copy link
Author

So when can we expect a new version with servo/string-cache#290 integrated? With nushell the third affected package is just coming to openSUSE:Factory

@Ygg01
Copy link
Contributor

Ygg01 commented Feb 20, 2025

I don't know the process, I can ask around Zulip for deets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants