-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT-LLM v0.17 Release #2725
Conversation
This release mentions |
Sorry for the mis-leading. The June |
I see. Perhaps the |
Are you sure you included the enc dec FP8 in the readme? Because it's not in the 0.17 branch or the main branch |
Thanks for the suggestion. The The reason that this time the release/0.17 branch contains more new features than main is due to the Blackwell release which makes the github update process a little bit special :) Thanks |
Thanks for catching this, @MahmoudAshraf97 . I just communicated with the team and there are something wrong about the doc update process for 0.17 release. We just updated the 0.17 release info now. Pls help review it to see whether there is anything else incorrect. Thanks again for reminding us about this doc issue. June |
Co-authored-by: Kaiyu Xie <[email protected]> open source f8c0381a2bc50ee2739c3d8c2be481b31e5f00bd (#2736) Co-authored-by: Kaiyu Xie <[email protected]> Add note for blackwell (#2742) Update the docs to workaround the extra-index-url issue (#2744) update README.md (#2751) Fix github io pages (#2761) Update
TensorRT-LLM Release 0.17.0 [UPDATED 1/31]
Key Features and Enhancements
LLM
API andtrtllm-bench
command.tensorrt_llm._torch
. The following is a list of supported infrastructure, models, and features that can be used with the PyTorch workflow.LLM
API.examples/multimodal/README.md
.userbuffer
based AllReduce-Norm fusion kernel.executor
API.API Changes
paged_context_fmha
is enabled.--concurrency
support for thethroughput
subcommand oftrtllm-bench
.Fixed Issues
cluster_key
for auto parallelism feature. ([feature request] Can we add H200 in infer_cluster_key() method? #2552)__post_init__
function ofLLmArgs
Class. Thanks for the contribution from @topenkoff in Fix kwarg name #2691.Infrastructure Changes
nvcr.io/nvidia/pytorch:25.01-py3
.nvcr.io/nvidia/tritonserver:25.01-py3
.