Skip to content

Enabling OpenCL GPU Model Caching in OpenVINO #205

Closed
@natelowry

Description

@natelowry

Describe the bug
OpenVINO suggests caching the GPU model for a faster load time. Is there a way to do this in the C# API? If so, I can't find it :)

Urgency
We're seeing ~8000-12000ms load time for our models which is super inconvenient for users. Seems like the majority of that time could be saved if we could load a cached model as OpenVINO recommends here.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 21H2
  • ONNX Runtime installed from (source or binary): OpenVINO version from this repo
  • ONNX Runtime version: ONNX Runtime 1.11.0
  • Python version: 3.9.13
  • Visual Studio version (if applicable): VS2022 (not instead on edge devices)
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: 2022.1.0.3787 (?)
  • GPU model and memory: Intel Iris Xe (i5-1145G7) 8GB 27.20.100.8935

To Reproduce
It would be nice to enable caching by specifying the ov::cache_dir property either while creating the session OR when appending the execution provider (probably on the EP).

The OptimizedModelFilePath in the constructor is for the ONNX graph optimization which doesn't work (as it uses compiled nodes) and is not recommended anyway.

Here's the documentation for enabling it in the C++ code:
https://docs.openvino.ai/latest/openvino_docs_OV_UG_Model_caching_overview.html
core.set_property(ov::cache_dir("/path/to/cache/dir"));

Ideally it would cache it by default and allow us to specify the cache directory if desired.

var sessionOptions = new SessionOptions
{
    LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_VERBOSE,
    GraphOptimizationLevel = GraphOptimizationLevel.ORT_DISABLE_ALL,
    //OptimizedModelFilePath = "na.onnx", //this is for the ONNX graph optimization, which doesn't apply here
    //CacheDirectory = "/this/could/be/an/option", //although since this is shared across all ONNX EPs it might be the wrong place
};

sessionOptions.AppendExecutionProvider_OpenVINO("GPU_FP16");
//sessionOptions.AppendExecutionProvider_OpenVINO(deviceId: "GPU_FP16", cacheDir: "/this/could/be/another/option"); 
//^ this seems like the more appropriate place as it's an OpenVINO specific option

_inferenceSession = new InferenceSession(modelPath: "model-path.onnx", options: sessionOptions);

Expected behavior
Model is cached when first built and read from said cache for subsequent usage.

Screenshots

Additional context
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions