-
Notifications
You must be signed in to change notification settings - Fork 169
support W4afp8 quant in v3.1 #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: bruce.xu <[email protected]>
b3f4b5e
to
8824e8b
Compare
@cjluo-nv please help review it, thanks |
assert weight_quantizer is None | ||
assert act_quantizer is None | ||
x, scale = act_quant(x, block_size) | ||
x, scale = act_quant(x, block_size, scale_fmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you walk through how updating the scale will give you W4A8?
In this case, what's the 4bit weight? Is it NVFP4 or INT4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't notice the comment before.
in my case, 4bit weight means int4.
in 3.1 case, when I want to quant the model, there is a interface mismatch with deepseek-v3.git
so I fix it @cjluo-nv
nvcr.io/nvidia/tensorrt-llm/release | ||
``` | ||
then we can operate modelopt in the docker pod as (Trtllm example)[https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/deepseek_v3/README.md?plain=1] | ||
but we should notice that just using the latest DeepSeek-V3.git is ok, because there is a dtype bug in bias proto at commit 1398800. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you recommend using a more recent commit?
What does this PR do?
support w4afp8 quant in v3.1(ue8m0)
Usage
just set gemm_impl to fp8 and use config_v3.1.json
Testing
after applying this patch, our model(3.1 using w4afp8) could reach aime25 50% and aime24 60%
Before your PR is "Ready for review"
Additional Information