Skip to content

Thread safety #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 26, 2025
Merged

Thread safety #574

merged 7 commits into from
Jul 26, 2025

Conversation

vouillon
Copy link
Member

@vouillon vouillon commented May 13, 2025

This PR makes RE thread-safe with OCaml 5.

We want the overhead to be minimal while matching a string, and allow concurrent string matching. Basically, RE works by traversing an automaton which is built lazily. So, there is no locking while traversing the automaton, but only when it is updated.

For that, we take advantage of the fact that double-checked locking is sound under the OCaml memory-model. When we reach a part of the automaton that has not been initialized yet, we acquire a mutex to update it. All the datastructures used to build this automaton are protected by this mutex.

@vouillon vouillon force-pushed the thread-safety branch 2 times, most recently from b6b13c8 to cb63cdf Compare May 16, 2025 23:01
@vouillon vouillon force-pushed the thread-safety branch 6 times, most recently from 0a1e998 to b90b799 Compare May 27, 2025 18:05
vouillon added 7 commits May 28, 2025 16:21
Since other thread can create automaton states with an index larger the
size of the position array, we compare at each step that the current
index fits when running the automaton.
This is slow, but we can check that TSan reports no data race.
Basically, we are building an automaton lazily. We use double-checked
locking to avoid acquiring a mutex when traversing a part of the
automaton which has already been computed. The memory model ensures that
we see either an uninitialized state or the initialized state. If we see
the initialized state, we can just proceed. Otherwise, we acquire a
mutex and update the state after checking this has not been done by
another thread.
Use a fake implementation of mutexes and domains in this case to avoid a
dependency on the threads library
@rgrinberg
Copy link
Member

@vouillon is this waiting on anything? Seems ready to me

@OlivierNicole
Copy link

IIUC we are lacking a reviewer.

@glittershark
Copy link
Contributor

Just as a data point: I ported this PR to OxCaml and we've been using it internally at Jane Street in production for a little over a month, and things seem to be working pretty well.

My one qualm about this PR in its current state is the switching of Str to use DLS: I think this gives a false sense of security to this module for little benefit, as it's still unsafe to use with systhreads. I'd weakly advocate for omitting that module in this PR entirely, and leaving it as completely thread-unsafe.

@rgrinberg
Copy link
Member

The real str uses DLS AFAIK. I think we should stay as compatible as possible with str.

In any case, thanks for testing. I'll merge this PR and @vouillon is always free to adjust anything later.

@rgrinberg rgrinberg merged commit 57d40d4 into master Jul 26, 2025
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants