v1.0.0: ONNX Runtime optimization and quantization support
ONNX Runtime support
- An
ORTConfig
class was introduced, allowing the user to define the desired export, optimization and quantization strategies. - The
ORTOptimizer
class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance ofORTOptimizer
, the user needs to provide anORTConfig
object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling theORTOptimizer.fit
method. - ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added
ORTQuantizer
class. In order to create an instance ofORTQuantizer
, the user needs to provide anORTConfig
object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling theORTQuantizer.fit
method.
Additionnal features for Intel Neural Compressor
We have also added a new class called IncOptimizer
which will take care of combining the pruning and the quantization processes.