Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH lazy load modules in the root __init__ #874

Merged
merged 9 commits into from
May 18, 2022
Merged

ENH lazy load modules in the root __init__ #874

merged 9 commits into from
May 18, 2022

Conversation

adrinjalali
Copy link
Contributor

@adrinjalali adrinjalali commented May 16, 2022

#824 requires modifying __getattr__ and __dir__ on the module level (available from py3.7: https://peps.python.org/pep-0562/)

This PR makes loading modules on the root __init__ of the repo to be lazy loaded, which also gives us a nice intro to implementing our own __getattr__ and __dir__ for futher changes.

As a side effect, this makes import huggingface_hub much faster:

on main:

$ time python -c "import huggingface_hub"
python -c "import huggingface_hub"  1.98s user 0.19s system 98% cpu 2.206 total

this PR:

$ time python -c "import huggingface_hub"
python -c "import huggingface_hub"  0.04s user 0.02s system 98% cpu 0.063 total

For more info: https://scientific-python.org/specs/spec-0001/

cc @LysandreJik @osanseviero @nateraw

also closes #875

"save_pretrained_keras",
],
"repository": ["Repository"],
"snapshot_download": ["snapshot_download"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failing CI due to this. Corner case @tupui , @stefanv ? (Trying lazy loading in on this project, but this makes our doc builder to fail since it resolves this to the module rather than the function).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce this behavior, although I will need to dig a bit into the cause.

snapshot_download is a function after import, but then gets overwritten by the module of the same name upon first access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the problem here is that the module and the function have the same name, so when you try and access huggingface_hub.snapshot_download it is not clear what it should refer to. I can add a warning to our code for this situation, but you probably want to rename the module to _snapshot_download.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably add something like this: https://github.com/adrinjalali/huggingface_hub/pull/1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we agree with renaming the file (ref: #875), but if lazy loading is to be adopted by more major libraries, it probably should try to not change the behavior I think.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 17, 2022

The documentation is not available anymore as the PR was closed or merged.

@adrinjalali
Copy link
Contributor Author

CI is green now.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice, thanks for working on it @adrinjalali!

Very clean to rely on the Python 3.7 feature, it's simpler to complete than what we have in transformers as we only need to add it to the resulting dependency directory.

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool 🔥 thanks for the PR!

adrinjalali and others added 3 commits May 18, 2022 17:11
Co-authored-by: Lysandre Debut <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
@adrinjalali
Copy link
Contributor Author

I updated as you suggested @LysandreJik , but in general I would be more in favor of not changing them since it makes it harder for us to copy new versions from upstream.

This should be ready for merge then :) I'm excited for the next steps.

@LysandreJik LysandreJik merged commit 064ae04 into huggingface:main May 18, 2022
@adrinjalali adrinjalali deleted the lazy_load branch May 18, 2022 15:58
@LysandreJik
Copy link
Member

I understand that changing them leads to more friction when wanting to update. Do you expect this code to evolve over time? I would prefer that all code in the codebase is formatted the same and documented the same, but I understand the pushback.

@stefanv
Copy link
Contributor

stefanv commented May 18, 2022

The code is changing in response to this issue, in fact :) It highlighted a small bug when functions are implemented inside of modules with the same name.

But, in general, we don't expect many big changes.

@adrinjalali
Copy link
Contributor Author

@LysandreJik the point of marking verdored code to me is to also make it clear that it's not our code, therefore it doesn't follow our standards or style, and it should not be exposed publicly.

Saying this, I now realize we're exposing this on the top level as a public function, I'll submit a PR making it private.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC rename snapshot_download.py
5 participants