You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
OpenVINO suggests caching the GPU model for a faster load time. Is there a way to do this in the C# API? If so, I can't find it :)
Urgency
We're seeing ~8000-12000ms load time for our models which is super inconvenient for users. Seems like the majority of that time could be saved if we could load a cached model as OpenVINO recommends here.
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 21H2
ONNX Runtime installed from (source or binary): OpenVINO version from this repo
ONNX Runtime version: ONNX Runtime 1.11.0
Python version: 3.9.13
Visual Studio version (if applicable): VS2022 (not instead on edge devices)
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: 2022.1.0.3787 (?)
GPU model and memory: Intel Iris Xe (i5-1145G7) 8GB 27.20.100.8935
To Reproduce
It would be nice to enable caching by specifying the ov::cache_dir property either while creating the session OR when appending the execution provider (probably on the EP).
The OptimizedModelFilePath in the constructor is for the ONNX graph optimization which doesn't work (as it uses compiled nodes) and is not recommended anyway.
Ideally it would cache it by default and allow us to specify the cache directory if desired.
var sessionOptions = new SessionOptions
{
LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_VERBOSE,
GraphOptimizationLevel = GraphOptimizationLevel.ORT_DISABLE_ALL,
//OptimizedModelFilePath = "na.onnx", //this is for the ONNX graph optimization, which doesn't apply here
//CacheDirectory = "/this/could/be/an/option", //although since this is shared across all ONNX EPs it might be the wrong place
};
sessionOptions.AppendExecutionProvider_OpenVINO("GPU_FP16");
//sessionOptions.AppendExecutionProvider_OpenVINO(deviceId: "GPU_FP16", cacheDir: "/this/could/be/another/option");
//^ this seems like the more appropriate place as it's an OpenVINO specific option
_inferenceSession = new InferenceSession(modelPath: "model-path.onnx", options: sessionOptions);
Expected behavior
Model is cached when first built and read from said cache for subsequent usage.
Screenshots
Additional context
Thanks!
The text was updated successfully, but these errors were encountered:
Describe the bug
OpenVINO suggests caching the GPU model for a faster load time. Is there a way to do this in the C# API? If so, I can't find it :)
Urgency
We're seeing ~8000-12000ms load time for our models which is super inconvenient for users. Seems like the majority of that time could be saved if we could load a cached model as OpenVINO recommends here.
System information
To Reproduce
It would be nice to enable caching by specifying the
ov::cache_dir
property either while creating the session OR when appending the execution provider (probably on the EP).The
OptimizedModelFilePath
in the constructor is for the ONNX graph optimization which doesn't work (as it uses compiled nodes) and is not recommended anyway.Here's the documentation for enabling it in the C++ code:
https://docs.openvino.ai/latest/openvino_docs_OV_UG_Model_caching_overview.html
core.set_property(ov::cache_dir("/path/to/cache/dir"));
Ideally it would cache it by default and allow us to specify the cache directory if desired.
Expected behavior
Model is cached when first built and read from said cache for subsequent usage.
Screenshots
Additional context
Thanks!
The text was updated successfully, but these errors were encountered: