Release v1.0.0: ONNX Runtime optimization and quantization support · huggingface/optimum

ONNX Runtime support

An ORTConfig class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
The ORTOptimizer class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of ORTOptimizer, the user needs to provide an ORTConfig object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the ORTOptimizer.fit method.
ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added ORTQuantizer class. In order to create an instance of ORTQuantizer, the user needs to provide an ORTConfig object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the ORTQuantizer.fit method.

We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.