-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Refactor Gluon parameter serialization format #18749
Conversation
Hey @leezu , Thanks for submitting the PR
CI supported jobs: [centos-gpu, clang, sanity, unix-cpu, windows-cpu, website, centos-cpu, windows-gpu, miscellaneous, unix-gpu, edge] Note: |
b19e2df
to
ea1b6bb
Compare
ea1b6bb
to
21fcf6f
Compare
python/mxnet/gluon/block.py
Outdated
params[names[0]] = param._reduce().asnumpy() | ||
for name in names[1:]: | ||
# Shared parameters are known under multiple names. We save the | ||
# parameter according to it's first name and save the mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo nit: its
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test for backward compatibility. The deserialization test should not rely on mxnet serialization to avoid the situation when backward compatibility is broken in serialization and deserialization simultaneously.
Also, I'd like to see backend serialization/deserialization happen since it likely will be needed. If we merge this for python only first and there's any issue in adding that support later we won't be able to rollback. |
There's no code in the backend that interacts with parameters. You can search for "MXNDArrayLoad" and you see that all code invoking this API has been removed. As the backend does not interact with the format, there is no point in adding APIs here, as their use is not yet defined.
The old format is faulty and I don't see a strong reason to provide backwards compatibility throughout 2.x series. We may remove the C API for loading the old format in a later PR (for example when removing the ndarray operators). As of this PR, backwards compatibility provided as there is no change to C APIs and the Python code backs of to using the C API if the input is not of the new format. It's tested already via unittests and integration tests (via the model-zoo API). Unittest: |
Yes, this PR do slove #18717. We might also need do refactor |
I think we can delete |
Frontend for JVM is planned in #17783. We will at least need to support inference with backend API which will require loading the parameters. MXNet will not be a Python-only framework and the design decision now on the serialization affects other frontends, so it must be taken with care. |
Discussed offline and the conclusions are:
|
I'll close this and proceed with a PR of larger scope later. |
Switch to npz serialization format https://numpy.org/devdocs/reference/generated/numpy.lib.format.html & preserve duplicate names of shared parameters (while storing them only once).
At this point in time, only Python bindings exist and no functionality is introduced to load the npz files in the backend (as it's not needed). Once the need arises, bindings can be introduced with ease via https://github.com/leezu/cnpy/tree/libzip
Fixes #18717 #18667