Skip to content

nvme_driver: don't flr nvme devices #1714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mattkur
Copy link
Contributor

@mattkur mattkur commented Jul 17, 2025

The default vfio device behavior is to issue a function level reset when attaching or detaching devices. It does so because the device is in an unknown or untrusted state. However, within the context of a trusted virtualization stack, OpenHCL can reasonably trust the state and behavior of the device. So, optimize performance by removing these function level resets for nvme devices. This follows the same model as already exists for MANA devices.

The nvme_driver already shuts down the device (see NvmeDriver::reset()) and waits for the device to become disabled. A well behaved nvme device will not issue DMA after this point. That same device should tolerate a graceful start without an FLR.

Pending work before this PR is ready to commit:

@mattkur mattkur requested a review from Copilot July 17, 2025 22:23
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the NVMe device initialization logic to disable Function Level Reset (FLR) during device attach/detach operations to improve system startup and shutdown performance. The change implements a fallback mechanism that attempts device initialization without FLR first, and only enables FLR if the initial attempt fails.

  • Refactors NVMe device creation into separate functions with retry logic
  • Adds reset method configuration to disable FLR by default
  • Implements fallback to FLR if device initialization fails without reset
Comments suppressed due to low confidence (1)

openhcl/underhill_core/src/nvme_manager.rs:244

  • [nitpick] The closure name 'update_reset' could be more descriptive. Consider renaming to 'set_device_reset_method' to better reflect its purpose.
    let update_reset = |method: PciDeviceResetMethod| {

@mattkur mattkur marked this pull request as ready for review July 17, 2025 22:30
@mattkur mattkur requested a review from a team as a code owner July 17, 2025 22:30
Copy link

@mattkur
Copy link
Contributor Author

mattkur commented Jul 17, 2025

Bummer, this doesn't work yet:

paravisor_log: inner_target="underhill_core::nvme_manager" "Failed to update reset_method" fields={"err": ["failed to create file `/sys/bus/pci/devices/182f:00:00.0/reset_method`: Permission denied (os error 13)"], "method": "NoReset

@jstarks
Copy link
Member

jstarks commented Jul 18, 2025

This is expected with our NVMe emulator, which does not (currently) support any of Linux's reset methods.

@mattkur
Copy link
Contributor Author

mattkur commented Jul 18, 2025

This is expected with our NVMe emulator, which does not (currently) support any of Linux's reset methods.

Got it. I see confirmation that this works when testing with a physical device. In addition, I have a draft PR to add FLR support (so we can test this in CI).

@mattkur mattkur changed the title wip: don't flr nvme devices nvme_driver: don't flr nvme devices Jul 18, 2025
@mattkur mattkur requested a review from a team July 18, 2025 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants