-
Notifications
You must be signed in to change notification settings - Fork 126
Hold the GIL when incrementing None's refcount to prevent race conditions when there are multiple Python threads #2334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -797,6 +797,7 @@ std::vector<std::variant<ReadResult, DataError>> PythonVersionStore::batch_read( | |||
} | |||
); | |||
} | |||
apply_global_refcounts(handler_data, read_options.output_format()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this and the couple of places in python_bindings.cpp
the only places where we create Python None
objects in the native layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, read methods are affected: read
, read_batch
, batch_read_keys
, read_index
, read_column_stats_version
, LibraryTool::read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have tests for all our APIs that can allocate None
s - like batch_read
and the things in python_bindings.cpp
that you touched as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, batch_read
is easy to test. I'm not sure how the other methods like: read_index
, batch_read_keys
, read_column_stats_version
and LibraryTool::read
can end up with None values. I suspect that only read
and batch_read
can contain user defined strings that can be None.
…and custom normalizers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only important comments are with regards to when to call apply_global_refcounts
e845e46
to
f36045a
Compare
call to apply_global_refcounts to a single place in adapt_read_df(s). Create all handler data in the python bindings.
f36045a
to
a72f055
Compare
Reference Issues/PRs
None is a global static object in Python which is also refcounted. When ArcticDB creates
None
objects it must increase their refcount. It must acquire the GIL when the refcount is increased. Currently we don't acquire the GIL when we do this, we only hold a SpinLock protecting other ArcticDB threads from racing on the GIL refcount. With this change we add an atomic variable in the PythonHandler data which will accumulate the refcount. Then at the end of the operation when we reacquire the GIL we will increase the refcount. The same is done for the NaN refcount, note that we don't really need the GIL to increase NaN's refcount as we create it internally and don't handle it to Python until the read operation is done. Currently only read operations need to work with theNone
object.apply_global_refcounts
must be called at the very end before passing the dataframe to python to prevent something raising an exception in after the refcount is applied but before python receives the data. Increasing None's refcount but never decreasing it doesn't seem to be fatal but we're trying to be good citizens. The best place for that isadapt_read_df
oradapt_read_dfs
as they are called at the end of all read functions. The code is changed so that the type handler data is created always in the python bindings file as it's easier to track.What does this implement or fix?
Any other comments?
Checklist
Checklist for code changes...