Skip to content

Conversation

@AdrS
Copy link

@AdrS AdrS commented Mar 27, 2025

Cloudpickle's Pickler class inherits from pickle.Pickler. pickle.Pickler is either the C implementation of the CPython pickler or a pure-Python pickler. Only the pure-Python pickler supports customizing how built-in types are pickled. This change introduces a PurePythonPickler class which inherits from pickle._Pickler and supports customizing how built-in types are pickled. The Pickler class continues to inherit from the faster C implementation when it is available. The implementation uses multiple inheritance and delegates calls to the proxy object of the second-in-MRO order superclass. The reason is to preserve most of the behavior of the stock pickler while minimizing changes to the cloudpickle.

Providing a means of customizing how built-in types are pickled will enable Apache Beam to implement (mostly) deterministic pickling for set and frozenset and increase the cache hit rate for workflows. See: #453

Cloudpickle's Pickler class either inherits from pickle.Pickler. pickle.Pickler
is either the C implementation of the CPython pickler or a pure-Python pickler.
Only the pure-Python pickler supports customizing how built-in types are
pickled. This change introduces a PurePythonPickler class which inherits from
pickle._Pickler and supports customizing how built-in types are pickled. The
Pickler class continues to inherit from the faster C implementation when it
is available.

Providing a means of customizing how built-in types are pickled enables users
to implement deterministic pickling for set and frozenset.
See: cloudpipe#453
@AdrS
Copy link
Author

AdrS commented Mar 31, 2025

@ogrisel is this a reasonable change?

@tvalentyn
Copy link
Contributor

tvalentyn commented Apr 3, 2025

Thanks @AdrS , the changes look reasonable to me and will make it easier to use cloudpickle in Apache Beam.

Hi @ogrisel ! would you be able to help us find a reviewer for this change or help take a look at this contribution? Thank you so much!

Please let us know if you have any questions or concerns.

@tvalentyn
Copy link
Contributor

@ogrisel just a friendly reminder that we are waiting for your feedback on the course of action here. Thanks!

AdrS added a commit to AdrS/beam that referenced this pull request Apr 21, 2025
This is to enable customizing how sets are serialized to increase the
pickling determinism. I'm modifying the vendored cloudpickle as a
stop-gap measure until the cloudpickle maintainers review
cloudpipe/cloudpickle#563.

Issue: apache#34410
@tvalentyn
Copy link
Contributor

tvalentyn commented Apr 29, 2025

It looks like @ogrisel might not be available for review right now.
@pierreglaser - any chance you might be available to help take a look at this PR ? Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants