-
Notifications
You must be signed in to change notification settings - Fork 169
Fix speculative decoding example #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Martin Gubri <[email protected]>
Signed-off-by: Martin Gubri <[email protected]>
Signed-off-by: Martin Gubri <[email protected]>
Signed-off-by: Martin Gubri <[email protected]>
…ception Signed-off-by: Martin Gubri <[email protected]>
Signed-off-by: Martin Gubri <[email protected]>
Signed-off-by: Martin Gubri <[email protected]>
Signed-off-by: Martin Gubri <[email protected]>
@yeyu-nvidia In addition to fixing your comment above, I've added additional commits to fix other issues:
This is good on my side. Please let me know if there are any issue. |
} | ||
mtsp.convert(model, [(mode, config)]) | ||
|
||
for name, param in model.named_parameters(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. We have freeze_base_model in forward and it is default set to True https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/speculative/plugins/transformers.py#L102
if [[ "$1" != *=* ]]; then shift; fi | ||
DO_EVAL="${1#*=}" | ||
;; | ||
--freeze_base_model*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed.
else: | ||
raise Exception(f"{training_args.mode} is not supported!") | ||
|
||
if training_args.freeze_base_model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this part is not needed.
Hey @Framartin, while reviewing your PR, I'd suggest the following code changes: You can also review and apply these suggestions locally on your machine. Learn more about GitKraken Code Suggest
Join your team on GitKraken to speed up PR review. |
Hey @Framartin, while reviewing your PR, I'd suggest the following code changes: You can also review and apply these suggestions locally on your machine. Learn more about GitKraken Code Suggest
Join your team on GitKraken to speed up PR review. |
Hi @Framartin can you please address feedback and update your PR? |
What does this PR do?
Bug fix: ?
Fix #211
Fix several bugs related to the speculative decoding example.
Overview:
main.py
,launch.sh
andREADME
--chat
flag in thegenerate_server.py
callsclient.chat.completions.create()
insteadclient.completions.create()
and fix vllm-specific args--gradient_accumulation_steps
assert
to handle the exception--train_bs
in README to ease adaptation to multiple GPUs while keeping the effective batch-size constantTesting
I've run the modified scripts.
Before your PR is "Ready for review"
Additional Information