Skip to content

v1.0.0: ONNX Runtime optimization and quantization support

Compare
Choose a tag to compare
@echarlaix echarlaix released this 24 Feb 17:15
· 1105 commits to main since this release

ONNX Runtime support

  • An ORTConfig class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
  • The ORTOptimizer class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of ORTOptimizer, the user needs to provide an ORTConfig object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the ORTOptimizer.fit method.
  • ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added ORTQuantizer class. In order to create an instance of ORTQuantizer, the user needs to provide an ORTConfig object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the ORTQuantizer.fit method.

Additionnal features for Intel Neural Compressor

We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.