Enabling OpenCL GPU Model Caching in OpenVINO

**Describe the bug**
OpenVINO suggests caching the GPU model for a faster load time.  Is there a way to do this in the C# API?  If so, I can't find it :)

**Urgency**
We're seeing ~8000-12000ms load time for our models which is super inconvenient for users.  Seems like the majority of that time could be saved if we could load a cached model as OpenVINO recommends [here](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Model_caching_overview.html).

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 21H2
- ONNX Runtime installed from (source or binary): OpenVINO version from this repo
- ONNX Runtime version: ONNX Runtime 1.11.0 
- Python version: 3.9.13
- Visual Studio version (if applicable): VS2022 (not instead on edge devices)
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: 2022.1.0.3787 (?)
- GPU model and memory: Intel Iris Xe (i5-1145G7) 8GB 27.20.100.8935

**To Reproduce**
It would be nice to enable caching by specifying the `ov::cache_dir` property either while creating the session OR when appending the execution provider (probably on the EP).

The `OptimizedModelFilePath` in the constructor is for the ONNX graph optimization which doesn't work (as it uses compiled nodes) and is [not recommended anyway](https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#onnxruntime-graph-optimization-level).

Here's the documentation for enabling it in the C++ code:
https://docs.openvino.ai/latest/openvino_docs_OV_UG_Model_caching_overview.html
`core.set_property(ov::cache_dir("/path/to/cache/dir")); `

Ideally it would cache it by default and allow us to specify the cache directory if desired.

```
var sessionOptions = new SessionOptions
{
    LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_VERBOSE,
    GraphOptimizationLevel = GraphOptimizationLevel.ORT_DISABLE_ALL,
    //OptimizedModelFilePath = "na.onnx", //this is for the ONNX graph optimization, which doesn't apply here
    //CacheDirectory = "/this/could/be/an/option", //although since this is shared across all ONNX EPs it might be the wrong place
};

sessionOptions.AppendExecutionProvider_OpenVINO("GPU_FP16");
//sessionOptions.AppendExecutionProvider_OpenVINO(deviceId: "GPU_FP16", cacheDir: "/this/could/be/another/option"); 
//^ this seems like the more appropriate place as it's an OpenVINO specific option

_inferenceSession = new InferenceSession(modelPath: "model-path.onnx", options: sessionOptions);
```

**Expected behavior**
Model is cached when first built and read from said cache for subsequent usage.

**Screenshots**


**Additional context**
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling OpenCL GPU Model Caching in OpenVINO #205

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enabling OpenCL GPU Model Caching in OpenVINO #205

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions