-
Notifications
You must be signed in to change notification settings - Fork 31.3k
[bnb] Small improvements on utils #18646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bnb] Small improvements on utils #18646
Conversation
- replace `modules_to_not_convert` by `module_to_not_convert`
|
The documentation is not available anymore as the PR was closed or merged. |
|
Can confirm the tests pass! |
|
so will there always be just one module not to convert? won't it be safer to have modules instead and work with the list? |
- changed variables name - now output a list - change error message
|
I have proposed a small refactoring that includes:
The bnb slow tests are passing with this fix! |
|
From #18660 I also just added a commit to support having a custom list of the keys to ignore |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, I left some comments.
src/transformers/modeling_utils.py
Outdated
| offload_state_dict = kwargs.pop("offload_state_dict", False) | ||
| load_in_8bit = kwargs.pop("load_in_8bit", False) | ||
| int8_threshold = kwargs.pop("int8_threshold", 6.0) | ||
| no_load_in_8bit_modules = kwargs.pop("no_load_in_8bit_modules", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to have this be a class variable of PreTrainedModel (like the no_split variable used for big model inference)? I'm afraid the user won't know what to set this too and it looks like it's something we should automatically handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on that but this argument is optional because the function get_keys_not_to_convert should automatically take care of that except for some models like Jukebox where it is a bit trickier due to its architecture.
In this case the user will just have to manually set which modules should be kept in their native precision and specify them in the kwargs, so I feel like it is a bit easier than having it as an argument of PretrainedModel because you would need to open a PR to add the feature.
Co-authored-by: stas00 <[email protected]>
Co-authored-by: stas00 <[email protected]>
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still good for me. I'll let @stas00 have a second look since merging is blocked by his change request.
Co-authored-by: Sylvain Gugger <[email protected]>
stas00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for addressing the suggestions, @younesbelkada
|
Can confirm the slow tests pass after rebasing on |
What does this PR do?
Fixes a small typo in
bitsandbytes.py, should address huggingface/blog#463 (comment)I will have to test it first and mark it as ready for review!