Skip to content

Verify that the process is alive in lookup/2 #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

urmastalimaa
Copy link

Fixes a data-race between an ets read from lookup/2 and the ets write from the 'DOWN' handler of a registered process which causes intermittent failures in supervision strategies.

In practice, the problem can occur quite often in tests which assert the init behaviour of gen_server's. In production, the already_started errors can cause a single failure to cascade and restart the parent supervisor.

A similar approach is taken in Elixir's Registry (also backed by ets)

Fixes #48

@@ -104,6 +104,8 @@ groups() ->
]},
{four_nodes_registry, [shuffle], [
four_nodes_concurrency
one_node_strict_mode,
Copy link
Owner

@ostinelli ostinelli Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was moved here, maybe by mistake?

@@ -104,6 +104,8 @@ groups() ->
]},
{four_nodes_registry, [shuffle], [
four_nodes_concurrency
one_node_strict_mode,
one_node_repeated_restart
Copy link
Owner

@ostinelli ostinelli Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if inside of this test group, we have a four nodes setup, can you please rename this with the convention of {number_of_nodes}_{test_name}? Unless this really is a one node, because I don't see a setup for 4 nodes in the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urmastalimaa urmastalimaa force-pushed the fix_already_started_when_registered_process_restarted branch 3 times, most recently from 1953cfd to 4e1b194 Compare March 25, 2025 17:17
@urmastalimaa
Copy link
Author

I added a check to avoid a rpc, as the aliveness check is necessary for only local processes:
https://github.com/ostinelli/syn/compare/872f137c5da0fc4978e1e6fe545b33c058ca09f6..4e1b194012e8be6b0253375c2dd9a4b4196cda41

Fixes a data-race between an ets read from lookup/2 and the ets write
from the 'DOWN' handler of a registered process which causes
intermittent failures in supervision strategies.

In practice, the problem can occur quite often in tests which assert the
init behaviour of gen_server's. In production, the `already_started`
errors can cause a single failure to cascade and restart the parent
supervisor.

A similar approach is taken in [Elixir's Registry](https://github.com/elixir-lang/elixir/blob/e35ffc5a903bff3b595e323eb1ac12c4ecd515ad/lib/elixir/lib/registry.ex#L243) (also backed by ets).

Fixes ostinelli#48
@urmastalimaa urmastalimaa force-pushed the fix_already_started_when_registered_process_restarted branch from 4e1b194 to a3f5c66 Compare March 28, 2025 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

syn doesn't seem to deregister a process in some cases
2 participants