[OVEP] OpenVINO EP Features and bug-fixes for ORT-1.24 #26672

preetha-intel · 2025-11-27T09:35:02Z

Description

This update delivers a streamlined set of enhancements to the OpenVINO Execution Provider (OVEP), improving configuration flexibility, inference stability, model handling, and platform reliability within ONNX Runtime.

Configuration & Properties

Broadened OVEP configuration support with improved mapping of layout, precision, and device settings.
More reliable propagation of ORT session options into OpenVINO, enabling predictable backend behavior.
Modernized provider registration and updated build integration to support newer OpenVINO targets.

Inference & Tensor Handling

Added a more robust execution-context lifecycle to ensure safe reuse of inference requests without stale state.
Improved dynamic-shape and tensor-binding behavior for more consistent multi-run or multi-session inference.
Enhanced automatic precision handling (including bfloat16 → float16) to improve portability across OpenVINO devices.
Strengthened tensor I/O handling when multiple models or contexts operate concurrently.

Model Handling & Operator Support

Improved OpenVINO subgraph extraction for more stable partitions and fewer unnecessary fallbacks.
Updated preprocessing and model-transformation paths to better support QDQ and layout-sensitive models.
Aligned operator behavior and test coverage with the latest ONNX/OpenVINO expectations.

Platform & Integration Fixes

Strengthened provider initialization and teardown for more reliable behavior across platforms.
Improved shared-library loading and backend construction in multi-session or multi-device scenarios.
Enhanced integration with ORT’s provider bridge for cleaner management of backend resources.

Quality & Maintenance

Expanded OVEP test coverage, including precision, initialization, and end-to-end graph tests.
Updated CPU and OVEP tests for better spec alignment and correctness verification.
General cleanup and modernization of OVEP code for improved clarity and maintainability.

Sync With latest msft commits

Changes to make sure to honor SessionOptions API Contract

…_PATH' (#602)

Co-authored-by: sfatimar <[email protected]>

@xenova

* Fix flash attention for GQA (Phi4) (microsoft#23850) ### Description This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause appears to be `k_start + capped_sg_id < seq_causal_length` check. This is either because, a. seq_causal_length varies per lane, so the check becomes non uniform control flow, which is having interactions with subgroupShuffle. or b. The check itself is incorrect and is wiping out values of v based on the source lane's seq_causal_length. While in actualness values of v need to be causal as per the lane that is going to multiply it with qkt. qkt is already causal because earlier values of qk for out of bounds k are set to min_value, and exp(<-4) are 0. This fix works by removing that causal check and relying on the qk being wiped out earlier. The documentation for causality behavior for GQA is missing to determine which of this reason is the true reason. Prior to this prompts with sequence length > 16 < 32 or 1k would break with Phi 4 but smaller prompts would work. Tested on Intel Alderlake, Nvidia 4070. * Model Builder API (microsoft#23223) ### Description  Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes. ### Motivation and Context  * Fix typo: change `Upample` to `Upsample`. (microsoft#23838) ### Description  Fixed a typo in function names related to the Upsample CUDA kernel. Changed incorrect spelling Upample to Upsample across relevant functions. ### Motivation and Context  This change is necessary to maintain consistency and prevent potential confusion caused by incorrect function names. * [doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (microsoft#23848) ### Description  Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ ### Motivation and Context  * Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (microsoft#23856) * Change the logic to generate the default ep context file name (microsoft#23788) Change the logic to generate the default ep context file name ### Description Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name. * Make Nuget QNN package pipeline 1ES compliant (microsoft#23805) ### Description Make [QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES compliant ### Motivation and Context  * [js/common] allows using Uint16Array as data for float16 tensor (microsoft#23827) ### Description Resolve microsoft#23817 ### Motivation and Context  * [js/webgpu] Reland the optimization of ConvTranspose (microsoft#23858) This PR fixes the errors in the ConvTranspose optimization and adds tests to ensure the correctness of the implementation. * [OpenVINO] Fix a build warning (microsoft#23877) ### Description Fix a warning with std::move usage ### Motivation and Context Possibly allow building without --compile_no_warning_as_error flag * Change gsl::byte to std::byte (microsoft#23872) To be compatible with the latest GSL library. Without this fix we will get: ``` onnxruntime\core\providers\cpu\controlflow\loop.cc(247): error C4996: 'gsl::byte': Use std::byte instead. ``` * Allow using extended minimal build for several EPs (microsoft#23834) ### Description #### Background From code search, the following EPs use `onnxruntime::GetCpuPreferredNodes()` in their `GetCapabilities()` methods: - CANN - CUDA - DML - JS - ROCM - WebGPU However, the source file that implements `onnxruntime::GetCpuPreferredNodes()` is excluded when minimal build is ON: https://github.com/microsoft/onnxruntime/blob/6df0973e58ba5399fcaa98686f70ed9a9e59aaef/cmake/onnxruntime_framework.cmake#L38-L42 This means that all EPs mentioned above is not able to compile with minimal build. #### Solution The excluded file `core/framework/fallback_cpu_capability.cc` cannot build in minimal build because some of its dependencies are not included in the minimal build. However, in extended minimal build mode, all dependencies are available. This PR looses the restrict and allows to compile this file when it is extended minimal build. After this change, those EPs are able to compile in extended minimal build. * Add dawn to ThirdPartyNotices (microsoft#23876) ### Description Add `dawn` to ThirdPartyNotices. * Enable QNN EP weight sharing generation using public API (microsoft#23702) ### Description Enable QNN EP weight sharing generation using public API instead of internal interfaces, so that user can integrate into their own toolchain. The change is to share the QnnBackendManager across ORT sessions if ep.share_ep_contexts is enabled. And there is extra option to end the share so that we know when to remove the shared QnnBackendManager from the singleton. Change the tool name from onnxruntime_qnn_ctx_gen to ep_weight_sharing_ctx_gen, so that it can be shared for other EPs. * [QNN-EP]: Fix inference failures while running with htp_shared_memory (microsoft#23892) ### Description When using the enable_htp_shared_memory feature, we see that the address of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are not freed leading to memory exhaustion. ### Motivation and Context When using the enable_htp_shared_memory_allocator feature for QNN in GenAI extensions, it leads to inference failures during the second prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI use cases. Co-authored-by: Ashish Garg <[email protected]> * Fix enable_pix_capture build for WebGPU (microsoft#23857) The build option --enable_pix_capture is broken. This fixes the problem. --------- Co-authored-by: wp <[email protected]> * [WebGPU-EP Native] Add ReduceMean (microsoft#23860) ### Description  ### Motivation and Context  * [WebGPU EP] introduce BiasAdd contrib op (microsoft#23861) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Dynamo export and improve benchmark script for SAM2 encoder (microsoft#23887) ### Description * Add dynamo export for Sam2 image encoder * Verify fp32 onnx model with CPU EP (to avoid error message from TRT EP). * Update benchmark script: - output ORT profiling - output torch compiled code and unique kernel name for compiled kernel - add an option for nightly package installation - uninstall existing ort packages before installing The node metadata of dynamo exported model can help mapping node in onnx model back to pytorch modeling script. Currently, the graph optimization is not done on dynamo exported model, so it is experimental right now. ### Motivation and Context To support profiling of torch compiled CUDA kernel. * [js/web] improve workaround for bundlers (microsoft#23902) ### Description This PR improves the workaround for bundlers in onnxruntime-web. Specifically, the following changes have been made: - Use [this workaround](xenova@9c50aa2) as suggested by @xenova in huggingface/transformers.js#1161 (comment) - Use `url > "file:" && url < "file;"` instead of `url.startsWith("file:")` to allow minifiers to remove dead code correctly. This change allows to remove unnecessary dependencies of file parsed from `new URL("ort.bundle.min.js", import.meta.url)` in Vite, and optimize code like `if("file://filepath.js".startsWith("file:")) {do_sth1(); } else {do_sth2();}` into `do_sth1()` for webpack/terser usages. Resolves huggingface/transformers.js#1161 * [webgpu] Restore MatMulNBits workgroup size for Phi-3.5 (microsoft#23349) ### Description This change restores the MatMulNBits workgroup size from (8, 8, 1) back to (16, 8, 1) to resolve a performance regression observed on Intel iGPUs during token generation (M=1). ### Motivation and Context As above. Signed-off-by: Jianhui Dai <[email protected]> * [webgpu] support Pad operator (microsoft#23141) ### Description  ### Motivation and Context  * [WebNN] Accept Float16Array for float16 data type if it is available (microsoft#23894) Float16Array is now shipping and WebNN Chromium implementation has accepted it. We should allow it in WebNN EP as well. * Ensure that the 'cmake_minimum_required' is version 3.5 or greater (microsoft#23888) ### Description CMake 4.0 release candidate 2.0 is available, and it cannot compile all of OnnxRuntime out-of-the-box. There's portions of the OnnxRuntime codebase that specify a `cmake_minimum_required` version of 3.0, and CMake 4.0 has removed support for compatibility with CMake < 3.5 - the following error is reported: ``` CMake Error at winml_sdk_helpers.cmake:4 (cmake_minimum_required): Compatibility with CMake < 3.5 has been removed from CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway. ``` Since CMake 3.5 appears to have shipped in 2016, it seems reasonable to set that as a minimum version to fix the error. The root CMakeLists.txt does ask for a minimum version of 3.28, so we could snap to that, but I'm still ramping up on the build, so wanted to propose a minimally sufficient fix. ### Motivation and Context Being able to build with the latest CMake - when it ships - reduces the barrier to entry to building OnnxRuntime, and allows the OnnxRuntime to leverage the latest and greatest tooling. * WebGPU: Remove deprecated subgroups-f16 from WebGPU native and JS EP (microsoft#23898) This PR removes the deprecated subgroups-f16 from WebGPU native and JS EP, and also remove the unused deviceInfo in WebGPU JS EP. * [JSEP/WebGPU] Fixed error in softmax dispatch. (microsoft#23906) ### Description Fixed an error softmax dispatch ### Motivation and Context Produce expected results for LlaMA model * enable WebGPU EP in WebAssembly build (microsoft#23913) ### Description This PR is the first step for migrating the webgpu backend of onnxruntime-web from JSEP based to WebGPU EP based. In this change, we enable building WebGPU EP in a wasm build (ie. `--build_wasm` `--use_webgpu` `--use_jsep`). However, the old build flags should still keep previous behavior. * Adding OpenVINO Windows CI Pipeline (microsoft#23919) ### Description  Enable an OpenVINO Windows CI pipeline. This includes: - Downloading the OpenVINO toolkit for Windows from an external source. - Setting up OpenVINO environment variables. - Building the ONNX Runtime OpenVINO Execution Provider. - Running unit tests. ### Motivation and Context  This change is required to run checks on precommit and commit in the ONNX Runtime project. It ensures that the code is tested with the OpenVINO toolkit on Windows, improving the reliability and compatibility of the project. * [WebGPU EP] SoftMax Implementation (microsoft#23538) Increase coverage for WebGPU Op * Exclude MAUI projects from GPU C# packaging builds (microsoft#23923) ### Description  Use 'desktop only' solution in GPU C# packaging builds. We don't need to include any MAUI support for those builds. ### Motivation and Context  * Support all block sizes that are multiples of 32 for DP4A (microsoft#23907) ### Description Simple change 1. The DP4A shader actually supports all block sizes that are multiples of 32, relaxing the restriction and making a small tweak to support sizes other than 32. 2. Moved the shader to a separate file for maintainability. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Example custom op with output type inferencing (microsoft#23916) ### Description  Add example of a custom op that is required to do type inference for the output type for the model load to work. Also acts as an example of how to override an ONNX op with a custom implementation. ### Motivation and Context  microsoft#23891 * Enabling L2+ Optimizations for EPs (microsoft#23517) There are some requirements to modify the graph which are specific to the EP/hardware. ORT has the hardcoded EP list for optimizations but that can't scale and it's hard be extended to enable EP custom optimizations. Here is the prototype to enable L2+ optimizations for EPs (The original overview is provided by @skottmckay) as well as the TRT EP implementation for the ConstantFoldingDQ optimization. Signatures for selection and optimization functions: ```` - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)> - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)> ```` GetCapability - call (new) provider bridge API to lookup pre-defined optimizer by name and get selection function - ComputeCapability.optimize_func, i.e. optimization function, would be set by the optimizer to the function that does the optimization - EP has to update the returning ComputeCapability to include the optimization ComputeCapability in nodes_to_optimize. So that later ORT can perform optimization/transformation accordingly. GraphPartitioner - After assigning the ComputeCapability to the EP and prior to Compile, if the ComputeCapability has nodes_to_optimize, iterate that list - optimization function needs to be called with - a mutable Graph instance - the ComputeCapability for the individual optimization - the overall ComputeCapability so it can be updated * fix binplace file in web pipeline (microsoft#23930) * Updated run_CIs_for_external_pr.py to support the Windows OpenVINO CI pipeline (microsoft#23931) * Fix ConvInteger handling of optional inputs. (microsoft#23935) ### Description  Fix ConvInteger handling of optional inputs. Need to check Exists() and not just the number of inputs. ### Motivation and Context  microsoft#23927 * Updated ov version in pipeline (#595) (microsoft#23882) ### Description This PR updates the OpenVINO version used in the pipeline from 2024.5.0 to 2025.0.0 Co-authored-by: jatinwadhwa921 <[email protected]> * [AIX] External data handling (microsoft#23859) ### Description In BE system, model tensor data coming from external file is not handled properly. This was found during the debugging of (microsoft/onnxruntime-genai#1104) This PR changes do the endianness conversion of data loaded from external file in BE system. * Create a packaging pipeline for a custom nuget package (microsoft#23918) * Fix license in example test code. (microsoft#23936) * replace usage of gsl::narrow and gsl::narrow_cast in WebGPU EP (microsoft#23926) ### Description `gsl::narrow` does not work in no exception build. - use `onnxruntime::narrow` if necessary; - or change to `static_cast` if it's obviously safe. also apply the changes to usage of `gsl::narrow_cast`, which does not apply checks. * VCPKG improvement: set VCPKG_OSX_DEPLOYMENT_TARGET (microsoft#23933) ### Description 1. Set VCPKG_OSX_DEPLOYMENT_TARGET for macOS targets 2. Enable VCPKG in more pipelines. * Allow using a different version of flatbuffers when building with vcpkg (microsoft#23946) ### Description Allow using a different version of flatbuffers when building with vcpkg, so that users do not need to pin flatbuffer's version, which provides more flexibility in the build process. Delete utf8_range from the dependencies, because it is an indirect dependency of protobuf, which is already included in the build process. ### Motivation and Context * Make python package pipeline 1ES compliant (microsoft#23800) ### Description Make [Python packaging pipeline](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_build?definitionId=841) 1ES compliant ### Motivation and Context  ### Checklist - [x] Make Onnxruntime-QNNEP-Windows-2022-CPU stateless * Delete ROCM Nuget Publishing Pipeline (microsoft#23948) * Bump SixLabors.ImageSharp from 2.1.9 to 2.1.10 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (microsoft#23924) Bumps [SixLabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.9 to 2.1.10. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">SixLabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.10</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2859">#2859</a> to release/2.1.x by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2890">SixLabors/ImageSharp#2890</a></li> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy] by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2891">SixLabors/ImageSharp#2891</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/SixLabors/ImageSharp/commit/d133ef99e8becfc3b924b0bb4315e63b8681d307"><code>d133ef9</code></a> Set lang version</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/5dfe5a800367581239de442cc18de659da6e9b1d"><code>5dfe5a8</code></a> Missed cache action update</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4d3a85112b03c89d2cb8616a5b747684b6e73730"><code>4d3a851</code></a> Use latest cache action</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4cb9f40a722ab2b837157862f0320c6a652da4d0"><code>4cb9f40</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2891">#2891</a> from SixLabors/af/backport-2701</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/bb82f79db0197166271d4355b5fb5ceda370a906"><code>bb82f79</code></a> <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy]</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/627b5f721f30f6d529acb50bd81f92bd3db754eb"><code>627b5f7</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2890">#2890</a> from SixLabors/af/backport-2859</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/67f7848d6e975e7956c8056823555de49a5fdf6d"><code>67f7848</code></a> try to fix LFS for *.BMP</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/44d294e06606111195152ead3006452357ef1bb9"><code>44d294e</code></a> 8.0.x is not needed</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/adb85d9e66aa3a588a86f4a4ef9a0539a8502117"><code>adb85d9</code></a> Another attempt for a Linux-specific skip</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/efc3fc4ee15eec4e523c26f7130e786541b00df2"><code>efc3fc4</code></a> Disable BmpDecoder_CanDecode_Os2BitmapArray on Linux</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=SixLabors.ImageSharp&package-manager=nuget&previous-version=2.1.9&new-version=2.1.10)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: Jianhui Dai <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Sushanth Rajasankar <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Seungtaek Kim <[email protected]> Co-authored-by: co63oc <[email protected]> Co-authored-by: Jambay Kinley <[email protected]> Co-authored-by: Hector Li <[email protected]> Co-authored-by: Jian Chen <[email protected]> Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Alessio Soldano <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Ashish Garg <[email protected]> Co-authored-by: Ashish Garg <[email protected]> Co-authored-by: Jie Chen <[email protected]> Co-authored-by: wp <[email protected]> Co-authored-by: Satya Kumar Jandhyala <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Jianhui Dai <[email protected]> Co-authored-by: xhcao <[email protected]> Co-authored-by: Wanming Lin <[email protected]> Co-authored-by: Mark Schofield <[email protected]> Co-authored-by: jiangzhaoming <[email protected]> Co-authored-by: Yi-Hong Lyu <[email protected]> Co-authored-by: vraspar <[email protected]> Co-authored-by: Chi Lo <[email protected]> Co-authored-by: saurabh <[email protected]> Co-authored-by: Ranjit Ranjan <[email protected]> Co-authored-by: Baiju Meswani <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

This reverts commit a6cdf62.

Revert "Rebasing with msft commits"

This reverts commit 920ed58, reversing changes made to a6cdf62.

[OVEP] Fix for precision accuracy

This change allows for allocations made by the ov allocator to be imported to other APIs that require base addresses to the original device allocation.

Backmerging with Msft commits

…ensions, preventing unnecessary fallback (#619)

Backmerging with Msft commits

Backmerging with msft commits

* fix: fix mem leaks * fix linux builds

…E_OUT is disabled (#850) * ovep stateful: Enable explicit slice of prefill logits when NPUW_SLICE_OUT is disabled * Update onnxruntime/core/providers/openvino/ov_interface.cc Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: MayureshV1 <[email protected]>

…ubgraph partitioning (#838) * added a line to add initializers to be a part of meta_def -> inputs * fixed possible array index out of bound problem which caused some models to fail rather than getting sg partitioned * changed loop logic * reverting to the previous logic to ensure j value is retained and not incremented if append_node == true * updated loop logic --------- Co-authored-by: Preetha Veeramalai <[email protected]>

Sync with Microsoft ONNX Runtime - 19/11/2025

* skipped testcase

Sync with Microsoft ONNX Runtime - 25/11/2025

preetha-intel · 2025-11-28T06:56:25Z

Hi @adrianlizarraga can you help review and merge the changes for OVEP

Copilot

Pull request overview

This PR delivers a comprehensive update to the OpenVINO Execution Provider (OVEP), focusing on improved configuration management, enhanced inference stability, better model handling, and platform reliability improvements for ONNX Runtime 1.24.

Key changes:

Introduced new shared context and binary management infrastructure for better resource handling across sessions
Added Windows ETW tracing support for telemetry and debugging
Enhanced EP context serialization/deserialization with binary format support
Improved dynamic model compilation and external initializer handling
Updated test infrastructure to handle precision differences and provider-specific behaviors

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
ov_shared_context.h/cc	New shared context manager for weights and compiled model blobs across sessions
ov_bin_manager.h/cc	Binary serialization/deserialization with BSON metadata for EP context
weak_singleton.h	Thread-safe weak singleton pattern for shared resources
ov_tracing.h/cc	Windows ETW tracing implementation for runtime telemetry
exceptions.h	Typed exception handling with NPU error code extraction
openvino_execution_provider.cc	Refactored to use new shared context manager and session tracking
backend_manager.cc	Updated EP context export logic and external initializer handling
onnx_ctx_model_helper.cc	Enhanced EP context node handling with partition names and main context flags
ov_interface.cc	Added NPU logits slicing and improved stateful inference request handling
qdq_stripping.cc	Refactored to use new shared context API
backend_utils.cc	Removed legacy tensor creation code, moved to shared context
checkers.cc	Added int4 and uint16 tensor validation with absolute error tolerance
openvino_ep_ext_init.cc	New test for external initializer handling (currently disabled)
slice_op.test.cc, resize_op_test.cc, etc.	Added OpenVINO-specific test skips and error tolerances
onnxruntime_pybind_state.cc	Added `set_ep_dynamic_options` Python API
provider_bridge_ort.cc	Increased max provider option value length to 2048
dllmain.cc, openvino_provider_dllmain.cc	Added protobuf cleanup for leak checking

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-02T19:55:16Z

onnxruntime/test/providers/openvino/openvino_ep_ext_init.cc

+namespace onnxruntime {
+namespace test {
+
+// this test requiresOV 2025.4+ to run, currently CI uses OV 2025.2, so the test will be disabled until OV is updated


The comment mentions "requiresOV 2025.4+" but the test is disabled. The comment should clarify why the test is disabled (CI uses OV 2025.2) and when it will be enabled.

Copilot · 2025-12-02T19:55:16Z

onnxruntime/test/providers/openvino/openvino_ep_ext_init.cc

+namespace onnxruntime {
+namespace test {
+
+// this test requiresOV 2025.4+ to run, currently CI uses OV 2025.2, so the test will be disabled until OV is updated


Typo in comment: "requiresOV" should be "requires OV" (missing space).

Copilot · 2025-12-02T19:55:16Z

onnxruntime/core/providers/openvino/ov_interface.cc

 #if (OPENVINO_VERSION_MAJOR > 2025 || (OPENVINO_VERSION_MAJOR == 2025 && OPENVINO_VERSION_MINOR >= 3))
-    if (!model_blob.maybe_native_blob_path_.empty()) {
-      obj = core.import_model(ov::read_tensor_data(model_blob.maybe_native_blob_path_), hw_target, device_config);
+    if (model_blob.tensor_) {


Missing input validation: The function should check if model_blob.tensor_ is valid before attempting to use it. If tensor_ is empty but the condition expects it to be valid, this could lead to undefined behavior.

Suggested change

if (model_blob.tensor_) {

if (model_blob.tensor_ && !model_blob.tensor_.empty()) {

Copilot · 2025-12-02T19:55:17Z

onnxruntime/core/providers/openvino/weak_singleton.h

+      // ensure another thread didn't create an instance while this thread was waiting
+      ptr = instance.lock();


Potential race condition: The WeakSingleton::Get() method has a double-checked locking pattern, but the second lock is obtained after checking !ptr. If another thread creates an instance between the first check and acquiring the lock, the pattern should work correctly. However, the comment on line 23 is misleading - it should say "ensure another thread didn't create an instance while this thread was waiting for the lock".

Copilot · 2025-12-02T19:55:17Z

onnxruntime/core/session/provider_bridge_ort.cc

+    if (strlen(provider_options_keys[i]) > 1024 || strlen(provider_options_values[i]) > 2048) {
      return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT,
-                                   "Maximum string length for a provider options key/value is 1024.");
+                                   "Maximum string length for a provider options key is 1024 and value is 2048.");


The error message states "key is 1024 and value is 2048" but this should be "key has maximum length 1024 and value has maximum length 2048" for clarity.

yuslepukhin · 2025-12-02T19:58:59Z

onnxruntime/core/dll/dllmain.cc

      // Windows API doc says: "When handling DLL_PROCESS_DETACH, a DLL should free resources such as heap memory only if the DLL is being unloaded dynamically"
      if (lpvReserved != nullptr) {
        g_is_shutting_down = true;
        // do not do cleanup if process termination scenario


// do not do cleanup if process termination scenario

This comment seems to be outdated now?

yuslepukhin · 2025-12-02T20:00:25Z

onnxruntime/python/onnxruntime_inference_collection.py

        """
        self._sess.run_with_iobinding(iobinding._iobinding, run_options)

+    def set_ep_dynamic_options(self, options: dict[str, str]):


def set_ep_dy

Is there test for this?

yuslepukhin · 2025-12-02T20:02:08Z

onnxruntime/core/providers/openvino/backend_manager.cc


+// this is a helper function to set the data fields, it duplicates ExternalDataInfo::SetExternalLocationToProto
+// but we cannot use that function as it is not part of public provider api.
+static void SetExternalDataFields(ONNX_NAMESPACE::TensorProto* proto_init, const void* data_ptr, int64_t data_size) {


ONNX_NAMESPACE::TensorProto* proto_i

Is this an optional argument?
If it is then it should be checked for nullptr,
if it is not optional it should be reference.

yuslepukhin · 2025-12-02T20:05:28Z

onnxruntime/core/providers/openvino/backend_manager.cc

 }

+// this is a helper function to set the data fields, it duplicates ExternalDataInfo::SetExternalLocationToProto
+// but we cannot use that function as it is not part of public provider api.


You do not need to do this because graph_utils has functions that convert from/to in memory TensorProto.

tensorprotoutils has utils::HasExternalData() and utils::HasExetrnalDataInMemory() to test it/

All of the above are exposed via provider bridge.

yuslepukhin · 2025-12-02T20:07:49Z

onnxruntime/test/unittest_util/checkers.cc

  void operator()(const Tensor& expected, const Tensor& actual, const ValidateOutputParams& params,
                  const std::string& /*provider_type*/) const {
-    ORT_UNUSED_PARAMETER(params);
+    const bool has_abs_err = params.absolute_error.has_value();


params.absolute_error.has_value();

std::optional is convertible to bool, you can use it in if() clause, not need for this

yuslepukhin · 2025-12-02T20:08:20Z

onnxruntime/test/unittest_util/checkers.cc

  void operator()(const Tensor& expected, const Tensor& actual, const ValidateOutputParams& params,
                  const std::string& /*provider_type*/) const {
-    ORT_UNUSED_PARAMETER(params);
+    const bool has_abs_err = params.absolute_error.has_value();


params.absolute_error.has

Convertible to bool

yuslepukhin · 2025-12-02T20:08:45Z

onnxruntime/test/unittest_util/checkers.cc

    const auto size = narrow<size_t>(actual.Shape().Size());
    cur_expected = expected.Data<Int4x2>();
    cur_actual = actual.Data<Int4x2>();
+    double threshold = 0.0f;


double threshold = 0.0f;

Since there is an if below you can use it w/o additional var

yuslepukhin · 2025-12-02T20:09:08Z

onnxruntime/test/unittest_util/checkers.cc

    cur_actual = actual.Data<UInt4x2>();

-    for (size_t i = 0; i < size; ++i) {
+    double threshold = 0.0f;


double threshold = 0.0f;

Not needed

yuslepukhin · 2025-12-02T20:10:46Z

onnxruntime/test/providers/openvino/openvino_ep_ext_init.cc

+
+namespace {
+
+std::vector<uint8_t> LoadFileToMemory(const std::string& path) {


How about returning std::optinal<std::vector<uint8_t>> so you can return {} if there is nothing?

yuslepukhin

🕐

yuslepukhin · 2025-12-02T20:11:54Z

I reviewed some common parts, but an intel person must review it as well.

jatinwadhwa921 and others added 30 commits February 24, 2025 18:49

Updated Internal CI (#581)

e85411a

Updated Internal CI OV version (#594)

0d42af9

Updated ov version in pipeline (#595)

3dc24ef

[OVEP] Fix for deprecated OV element type (#597)

9c2fee5

Merge branch 'master' into sync_msft_29_2_25

17f4bc7

Merge pull request #600 from intel/sync_msft_29_2_25

4bb577a

Sync With latest msft commits

Sahar/session option develop (#601)

60ee27a

Changes to make sure to honor SessionOptions API Contract

Use absolute paths for libraries loaded with LOAD_WITH_ALTERED_SEARCH…

ec62bf3

…_PATH' (#602)

Remove unintended model copies during compilation (#584)

bd32f51

Co-authored-by: sfatimar <[email protected]>

Revert "Rebasing with msft commits (#607)"

cdc209c

This reverts commit a6cdf62.

Merge pull request #607

920ed58

Revert "Rebasing with msft commits"

Revert "Merge pull request #607"

788fc78

This reverts commit 920ed58, reversing changes made to a6cdf62.

Merge pull request #609 from intel/jatin_revert_msft_changes

a046532

Merge branch 'master' into syncing_msft_commits_3_10_25

ec98cce

Merge pull request #611 from intel/syncing_msft_commits_3_10_25

73e0fea

[OVEP] Fix for precision accuracy

61b36ef

Merge pull request #603 from intel/jatin_fix_precison_acc_issue

ea13a05

[OVEP] Fix for precision accuracy

Refactor OVRTAllocator to return base pointer of remote tensor (#613)

7683e37

This change allows for allocations made by the ov allocator to be imported to other APIs that require base addresses to the original device allocation.

Commit Lint Errors fix (#606)

1788576

fix quantizedLinear layer feeds into grapg output (#615)

23e17e2

Merge branch 'master' into sync_msft_18_3_25

91e64fe

Merge pull request #621 from intel/sync_msft_18_3_25

6083601

Backmerging with Msft commits

[OVEP] Fix for dumping the model in correct format (#616)

8b4a6d2

[OVEP] Added Cast and Resize to operators that handle zero-valued dim…

7269615

…ensions, preventing unnecessary fallback (#619)

Merge branch 'master' into sync_msft_20_3_25

9ee95d1

Merge pull request #624 from intel/sync_msft_20_3_25

2a24806

Backmerging with Msft commits

Merge branch 'master' into sync_msft_25_3_25

c7ac5c8

Merge pull request #627 from intel/sync_msft_25_3_25

2c61a3a

Backmerging with msft commits

[OVEP] Fix for Dynamic backend creation for NPU. (#622)

e240695

ankitm3k and others added 13 commits November 18, 2025 15:13

CVS-176574 : Fix memory leaks for protobuf & DataOps (#852)

0d68ee5

* fix: fix mem leaks * fix linux builds

CVS-175504 Fix mixing weight shared and non-shared models (#854)

dbd1ce0

Updating OVEP to support 2025.4.0 (#853)

24c833c

Merge branch 'master' into sync_msft_19112025

4b1976a

Merge branch 'ovep-develop' into sync_msft_19112025

cb8c270

Merge pull request #856 from intel/sync_msft_19112025

ade6a2e

Sync with Microsoft ONNX Runtime - 19/11/2025

skipped failing testcase (MathOpTest.Clip_Default_int64) (#860)

2f7212c

* skipped testcase

reset ort.eprp file changes (#862)

5a06f68

Merge branch 'master' into sync_msft_25112025

6274e3c

Merge pull request #863 from intel/sync_msft_25112025

bb738ba

Sync with Microsoft ONNX Runtime - 25/11/2025

Revert OV CI pipeline scripts

3df4951

preetha-intel force-pushed the ovep-1.24 branch from 21711d8 to 3df4951 Compare November 27, 2025 09:41

yuslepukhin requested a review from Copilot December 2, 2025 19:50

Copilot started reviewing on behalf of yuslepukhin December 2, 2025 19:51 View session

Copilot finished reviewing on behalf of yuslepukhin December 2, 2025 19:53

Copilot AI reviewed Dec 2, 2025

View reviewed changes

yuslepukhin reviewed Dec 2, 2025

View reviewed changes

yuslepukhin requested changes Dec 2, 2025

View reviewed changes

	if (model_blob.tensor_) {
	if (model_blob.tensor_ && !model_blob.tensor_.empty()) {

		// ensure another thread didn't create an instance while this thread was waiting
		ptr = instance.lock();


		namespace {

		std::vector<uint8_t> LoadFileToMemory(const std::string& path) {

[OVEP] OpenVINO EP Features and bug-fixes for ORT-1.24 #26672

Are you sure you want to change the base?

[OVEP] OpenVINO EP Features and bug-fixes for ORT-1.24 #26672

Conversation

preetha-intel commented Nov 27, 2025

Description

Configuration & Properties

Inference & Tensor Handling

Model Handling & Operator Support

Platform & Integration Fixes

Quality & Maintenance

Uh oh!

preetha-intel commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

yuslepukhin commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

26 participants

yuslepukhin Dec 2, 2025 •

edited

Loading

yuslepukhin Dec 2, 2025 •

edited

Loading