Thread safety #574

vouillon · 2025-05-13T13:21:33Z

This PR makes RE thread-safe with OCaml 5.

We want the overhead to be minimal while matching a string, and allow concurrent string matching. Basically, RE works by traversing an automaton which is built lazily. So, there is no locking while traversing the automaton, but only when it is updated.

For that, we take advantage of the fact that double-checked locking is sound under the OCaml memory-model. When we reach a part of the automaton that has not been initialized yet, we acquire a mutex to update it. All the datastructures used to build this automaton are protected by this mutex.

Since other thread can create automaton states with an index larger the size of the position array, we compare at each step that the current index fits when running the automaton.

This is slow, but we can check that TSan reports no data race.

Basically, we are building an automaton lazily. We use double-checked locking to avoid acquiring a mutex when traversing a part of the automaton which has already been computed. The memory model ensures that we see either an uninitialized state or the initialized state. If we see the initialized state, we can just proceed. Otherwise, we acquire a mutex and update the state after checking this has not been done by another thread.

Use a fake implementation of mutexes and domains in this case to avoid a dependency on the threads library

rgrinberg · 2025-07-20T13:16:10Z

@vouillon is this waiting on anything? Seems ready to me

OlivierNicole · 2025-07-23T13:28:23Z

IIUC we are lacking a reviewer.

glittershark · 2025-07-23T17:37:43Z

Just as a data point: I ported this PR to OxCaml and we've been using it internally at Jane Street in production for a little over a month, and things seem to be working pretty well.

My one qualm about this PR in its current state is the switching of Str to use DLS: I think this gives a false sense of security to this module for little benefit, as it's still unsafe to use with systhreads. I'd weakly advocate for omitting that module in this PR entirely, and leaving it as completely thread-unsafe.

rgrinberg · 2025-07-26T12:34:56Z

The real str uses DLS AFAIK. I think we should stay as compatible as possible with str.

In any case, thanks for testing. I'll merge this PR and @vouillon is always free to adjust anything later.

vouillon force-pushed the thread-safety branch 2 times, most recently from b6b13c8 to cb63cdf Compare May 16, 2025 23:01

vouillon force-pushed the thread-safety branch 6 times, most recently from 0a1e998 to b90b799 Compare May 27, 2025 18:05

vouillon added 7 commits May 28, 2025 16:21

Add a thread-safety test

969e782

Resize position array in a thread-safe way

6a6699a

Since other thread can create automaton states with an index larger the size of the position array, we compare at each step that the current index fits when running the automaton.

Use a mutex to ensure thread-safety

b5cfcbd

This is slow, but we can check that TSan reports no data race.

Str: use a domain local value to store the global state

fac5551

Compatibility with OCaml < 5

a35565d

Use a fake implementation of mutexes and domains in this case to avoid a dependency on the threads library

CI: run the tests using TSan

070c9a8

vouillon force-pushed the thread-safety branch from b90b799 to 070c9a8 Compare May 28, 2025 14:21

rgrinberg merged commit 57d40d4 into master Jul 26, 2025
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thread safety #574

Thread safety #574

Uh oh!

vouillon commented May 13, 2025 •

edited

Loading

Uh oh!

rgrinberg commented Jul 20, 2025

Uh oh!

OlivierNicole commented Jul 23, 2025

Uh oh!

glittershark commented Jul 23, 2025

Uh oh!

rgrinberg commented Jul 26, 2025

Uh oh!

Uh oh!

Uh oh!

Thread safety #574

Thread safety #574

Uh oh!

Conversation

vouillon commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgrinberg commented Jul 20, 2025

Uh oh!

OlivierNicole commented Jul 23, 2025

Uh oh!

glittershark commented Jul 23, 2025

Uh oh!

rgrinberg commented Jul 26, 2025

Uh oh!

Uh oh!

Uh oh!

vouillon commented May 13, 2025 •

edited

Loading