Skip to content

Hardening: split libkrun/VMM processes for finer-grained seccomp after xattr allowlist #848

Description

@G4614

Context

PR #846 fixes writable virtiofs volumes by allowing the Linux xattr syscall family in the VMM seccomp profile. That is a practical compatibility fix: host-side virtiofs handling can call fgetxattr/related syscalls while preserving file metadata such as security.capability, and blocking those syscalls SIGSYS-kills the VMM during otherwise normal writes.

However, the current seccomp granularity is VMM-process-wide. The vmm filter is applied with TSYNC to all threads in the shim/VMM process, so the xattr allowlist is available not only to the virtiofs path that needs it, but also to other threads sharing the same process boundary, such as libkrun runtime threads and embedded networking/helper threads.

Problem

This is acceptable as a short-term unblocker, but it is not the ideal long-term security shape. The current architecture forces one broad VMM seccomp profile to cover multiple responsibilities:

  • libkrun / core VMM execution
  • virtiofs host-side file serving
  • networking/helper runtime threads
  • other shim-managed runtime work

Because those components share one process-wide filter, any syscall required by one component expands the allowed syscall surface for all of them. PR #846 makes this visible with the xattr family (getxattr, setxattr, listxattr, removexattr, including l* and f* variants).

Desired direction

Refactor the libkrun/VMM runtime boundary so security policy can be applied at a finer granularity, ideally per process or per narrowly-scoped component. For example:

  • Keep the core VMM/libkrun process on the smallest syscall profile it actually needs.
  • Run virtiofs/file-serving work behind a separate process boundary with an explicit xattr-capable seccomp profile.
  • Keep networking/helper paths on their own profile where possible.
  • Avoid one shared process-wide allowlist accumulating every syscall needed by any subsystem.

Acceptance criteria

  • Document the current VMM-process-wide seccomp limitation and why fix(jailer): set LD_LIBRARY_PATH in bwrap sandbox so the shim can dlopen libkrunfw #846 had to allow xattr syscalls there.
  • Design a process/component split that supports narrower seccomp profiles.
  • Introduce separate seccomp profiles for at least the core VMM and virtiofs/file-serving component.
  • Ensure writable virtiofs volumes still work without granting xattr syscalls to unrelated runtime components.
  • Add regression coverage showing writable virtiofs metadata operations continue to pass under the narrowed profile.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions