Skip to content

Hold the GIL when incrementing None's refcount to prevent race conditions when there are multiple Python threads #2334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 2, 2025

Conversation

vasil-pashov
Copy link
Collaborator

@vasil-pashov vasil-pashov commented Apr 16, 2025

Reference Issues/PRs

None is a global static object in Python which is also refcounted. When ArcticDB creates None objects it must increase their refcount. It must acquire the GIL when the refcount is increased. Currently we don't acquire the GIL when we do this, we only hold a SpinLock protecting other ArcticDB threads from racing on the GIL refcount. With this change we add an atomic variable in the PythonHandler data which will accumulate the refcount. Then at the end of the operation when we reacquire the GIL we will increase the refcount. The same is done for the NaN refcount, note that we don't really need the GIL to increase NaN's refcount as we create it internally and don't handle it to Python until the read operation is done. Currently only read operations need to work with the None object.

apply_global_refcounts must be called at the very end before passing the dataframe to python to prevent something raising an exception in after the refcount is applied but before python receives the data. Increasing None's refcount but never decreasing it doesn't seem to be fatal but we're trying to be good citizens. The best place for that is adapt_read_df or adapt_read_dfs as they are called at the end of all read functions. The code is changed so that the type handler data is created always in the python bindings file as it's easier to track.

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@vasil-pashov vasil-pashov added the patch Small change, should increase patch version label Apr 16, 2025
Base automatically changed from vasil.pashov/fix-lock-tacking-for-none-incref to master April 17, 2025 13:04
@vasil-pashov vasil-pashov marked this pull request as ready for review April 17, 2025 14:28
@vasil-pashov vasil-pashov changed the title Fix handling of None refcount when reads from multiple threads Hold the GIL when incrementing None's refcount to prevent race conditions when there are multiple Python threads Apr 23, 2025
@@ -797,6 +797,7 @@ std::vector<std::variant<ReadResult, DataError>> PythonVersionStore::batch_read(
}
);
}
apply_global_refcounts(handler_data, read_options.output_format());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this and the couple of places in python_bindings.cpp the only places where we create Python None objects in the native layer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, read methods are affected: read, read_batch, batch_read_keys, read_index, read_column_stats_version, LibraryTool::read

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have tests for all our APIs that can allocate Nones - like batch_read and the things in python_bindings.cpp that you touched as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, batch_read is easy to test. I'm not sure how the other methods like: read_index, batch_read_keys, read_column_stats_version and LibraryTool::read can end up with None values. I suspect that only read and batch_read can contain user defined strings that can be None.

Copy link
Collaborator

@IvoDD IvoDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only important comments are with regards to when to call apply_global_refcounts

@vasil-pashov vasil-pashov force-pushed the vasil.pashov/hold-gil-on-none-incref branch from e845e46 to f36045a Compare May 2, 2025 10:46
call to apply_global_refcounts to a single place in adapt_read_df(s).
Create all handler data in the python bindings.
@vasil-pashov vasil-pashov force-pushed the vasil.pashov/hold-gil-on-none-incref branch from f36045a to a72f055 Compare May 2, 2025 11:10
@vasil-pashov vasil-pashov merged commit 9b93303 into master May 2, 2025
139 checks passed
@vasil-pashov vasil-pashov deleted the vasil.pashov/hold-gil-on-none-incref branch May 2, 2025 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants