Torch-TensorRT v1.3.0 #1505
narendasan
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
PyTorch 1.13, CUDA 11.7, TensorRT 8.5, Support for Dynamic Batch for Partially Compiled Modules, Engine Profiling, Experimental Unified Runtime for FX and TorchScript Frontends
Torch-TensorRT 1.3.0 targets PyTorch 1.13, CUDA 11.7, cuDNN 8.5 and TensorRT 8.5. This release focuses on adding support for Dynamic Batch Sizes for partially compiled modules using the TorchScript frontend (this is also supported with the FX frontend). It also introduces a new execution profiling utility to understand the execution of specific engine sub blocks that can be used in conjunction with PyTorch profiling tools to understand the performance of your model post compilation. Finally this release introduces a new experimental unified runtime shared by both the TorchScript and FX frontends. This allows you to start using the FX frontend to generate
torch.jit.traceable compiled modules.Dynamic Batch Sizes for Partially Compiled Modules via the TorchScript Frontend
A long-standing limitation of the partitioning system in the TorchScript function is lack of support for dynamic shapes. In this release we address a major subset of these use cases with support for dynamic batch sizes for modules that will be partially compiled. Usage is the same as the fully compiled workflow where using the
torch_tensorrt.Inputclass, you may define the range of shapes that an input may take during runtime. This is represented as a set of 3 shape sizes:min,maxandopt.minandmaxdefine the dynamic range of the input Tensor.optinforms TensorRT what size to optimize for provided there are multiple valid kernels available. TensorRT will select kernels that are valid for the full range of input shapes but most efficient at theoptsize. In this release, partially compiled module inputs can vary in shape for the highest order dimension.For example:
Is a valid shape range, however:
is still not supported.
Engine Profiling [Experimental]
This release introduces a number of profiling tools to measure the performance of TensorRT sub blocks in compiled modules. This can be used in conjunction with PyTorch profiling tools to get a picture of the performance of your model. Profiling for any particular sub block can be enabled by the
enabled_profiling()method of any__torch__.classes.tensorrt.Engineattribute, or of anytorch_tensorrt.TRTModuleNext. The profiler will dump trace files by default in/tmp, though this path can be customized by either setting theprofile_path_prefixof__torch__.classes.tensorrt.Engineor as an argument totorch_tensorrt.TRTModuleNext.enable_precision(profiling_results_dir=""). Traces can be visualized using the Perfetto tool (https://perfetto.dev)Engine Layer information can also be accessed using
get_layer_infowhich returns a JSON string with the layers / fusions that the engine contains.Unified Runtime for FX and TorchScript Frontends [Experimental]
In previous versions of Torch-TensorRT, the FX and TorchScript frontends were mostly separate and each had their distinct benefits and limitations. Torch-TensorRT 1.3.0 introduces a new unified runtime to support both FX and TorchScript meaning that you can choose the compilation workflow that makes the most sense for your particular use case, be it pure Python conversion via FX or C++ Torchscript compilation. Both frontends use the same primitives to construct their compiled graphs be it fully compiled or just partially.
Basic Usage
The TorchScript frontend uses the new runtime by default. No additional workflow changes are necessary.
For the FX frontend, the new runtime can be chosen but setting
use_experimental_fx_rt=Trueas part of your compile settings to eithertorch_tensorrt.compile(my_mod, ir="fx", use_experimental_fx_rt=True, explicit_batch_dimension=True)ortorch_tensorrt.fx.compile(my_mod, use_experimental_fx_rt=True, explicit_batch_dimension=True)TRTModuleNext
The FX frontend will return a
torch.nn.Modulecontainingtorch_tensorrt.TRTModuleNextsubmodules instead oftorch_tensorrt.fx.TRTModules. The features of these modules are nearly identical but with a few key improvements.TRTModuleNextprofiling dumps a trace visualizable with Perfetto (see above for more details).TRTModuleNextmodules aretorch.jit.trace-able, meaning you can save FX compiled modules as TorchScript for python-less / C++ deployment scenarios. Traced compiled modules have the same deployment instructions as compiled modules produced by the TorchScript frontend.TRTModulesupports as well (state_dict / extra_state, torch.save/torch.load)Examples
Using TRTModuleNext as an arbirary TensorRT engine holder
Using TorchScript you have long been able to embed an arbritrary TensorRT engine from any source in a TorchScript module using
torch_tensorrt.ts.embed_engine_in_new_module. Now you can do this at thetorch.nn.Modulelevel by directly usingTRTModuleNextand access all the benefits enumerated above.The intention is in a future release to have
torch_tensorrt.TRTModuleNextreplacetorch_tensorrt.fx.TRTModuleas the default TensorRT Module implementation. Feedback on this class or how it is used, the runtime in general or associated features (profiler, engine inspector) is welcomed.What's Changed
aten::index.Tensorby @ruoqianguo in Fix bug: correct the output shape ofaten::index.Tensor#1314torch.stdandtorch.varsupport multi-dimensional reductions by @gs-olive in fix:torch.stdandtorch.varsupport multi-dimensional reductions #1395aten::splitbehavior with negative indexing by @gs-olive in fix:aten::splitbehavior with negative indexing #1403aten::masked_fillby @gs-olive in fix: Ensure proper type inheritance inaten::masked_fill#1430noxfile.pyby @gs-olive in chore: Lintnoxfile.py#1443atenoperators by @gs-olive in fix: Device casting issues with certainatenoperators #1416aten::divwhen using truncation with Int32 tensor inputs by @gs-olive in fix: Error withaten::divwhen using truncation with Int32 tensor inputs #1442New Contributors
Full Changelog: v1.1.0...v1.3.0
This discussion was created from the release Torch-TensorRT v1.3.0.
Beta Was this translation helpful? Give feedback.
All reactions