Skip to content

seccomp: honor sandbox.seccomp.syscalls.on_block (errno | kill | log | log_and_kill)#233

Merged
erans merged 20 commits intomainfrom
feat/seccomp-on-block
Apr 16, 2026
Merged

seccomp: honor sandbox.seccomp.syscalls.on_block (errno | kill | log | log_and_kill)#233
erans merged 20 commits intomainfrom
feat/seccomp-on-block

Conversation

@erans
Copy link
Copy Markdown
Collaborator

@erans erans commented Apr 16, 2026

Summary

  • The sandbox.seccomp.syscalls.on_block schema was advertised with four values but only one path was wired up: the filter always resolved to SCMP_ACT_ERRNO, regardless of config. on_block: kill, log, and log_and_kill silently degraded to errno.
  • This branch fully implements the four modes end-to-end: errno and kill resolve kernel-side (SCMP_ACT_ERRNO(EPERM) / SCMP_ACT_KILL_PROCESS); log and log_and_kill route through SCMP_ACT_NOTIFY to a handler that emits a typed seccomp_blocked event and (for log_and_kill) sends SIGKILL via pidfd_send_signal.
  • Default is unchanged (errno). Deployments that explicitly set on_block: kill expecting the advertised schema now get a real kernel-side kill rather than a silent EPERM fallback — call-out in the changelog.

Behavioral change

on_block Before this PR After this PR
unset / errno EPERM EPERM (unchanged)
kill EPERM (silent fallback) SIGSYS via SCMP_ACT_KILL_PROCESS
log EPERM (silent fallback) EPERM + seccomp_blocked audit event
log_and_kill EPERM (silent fallback) SIGKILL via pidfd_send_signal + seccomp_blocked audit event

Startup warning is emitted when log / log_and_kill is selected but no audit emitter is registered, so silent event drops don't surprise operators.

Design / spec

Design doc: docs/superpowers/specs/2026-04-15-seccomp-on-block-design.md.

Notable implementation choices

  • TID→TGID resolution. seccomp_notif.pid is a TID, not a TGID (see man 2 seccomp). On kernel 6.8 (Ubuntu 24.04 target), pidfd_open(non_TGL_tid, 0) returns ENOENT — the trapped non-leader thread's task never becomes a valid pidfd target. Without TID→TGID resolution, log_and_kill would silently fail for multi-threaded agents whose first blocked syscall came from a non-leader thread. Fix: parse /proc/<tid>/status Tgid: before opening the pidfd. PIDFD_THREAD would solve this directly but requires kernel 6.9+, below our floor.
  • TOCTOU recheck. Between NotifIDValid and pidfd_send_signal, the target could exit and its PID could be recycled. The handler revalidates the notif id after opening the pidfd but before signalling. An ENOENT on recheck is the canonical "notif id is gone" signal — we skip the signal (pidfd may reference a reused PID) and tag outcome killed. Any other revalidation error is NOT evidence the target exited; we refuse to signal and tag denied.
  • Per-thread emitter lifecycle. Early review flagged a regression where a nil emitter short-circuited the entire notify loop — silently disabling enforcement. Fixed by splitting the entry guard: a nil fd returns, but a nil emit only warns. Regression test captures slog output to confirm the loop is actually entered.

Test plan

  • Unit tests: OnBlockAction enum + validation, filter construction per mode, pidfd seams (pidfd_open ESRCH/EPERM, pidfd_send_signal ESRCH/EINVAL), TOCTOU recheck (ENOENT vs non-ENOENT), TID→TGID resolution (main thread, non-leader, non-existent), event builder schema, block-list dispatch.
  • Integration tests (//go:build integration && linux): one per mode — TestSeccompOnBlock_{Errno,Kill,Log,LogAndKill} — plus a 100-goroutine storm test (TestSeccompOnBlock_LogAndKill_ConcurrentCalls) that fires ptrace exclusively from non-TGL threads, proving the TID→TGID path works under load. A pre-fix handler would see ENOENT on every pidfd_open; the storm test would fail st.Signaled().
  • Full go test ./... green.
  • Cross-compile: GOOS=windows go build ./... and GOOS=darwin go build ./... green (the feature is Linux-only; no-op stubs on the other platforms).
  • roborev review --reasoning maximum --wait run after every commit; all above-LOW findings addressed before proceeding.

Known pre-existing failures (not introduced by this branch)

  • internal/ptrace/TestIntegration_{NetworkDenyConnect,FileRedirect,ConnectRedirect,ScratchPage,TracerPidNotMasked,ConnectExitRetainedTLS,WriteEscalationOnTLSConnect} — fail identically on main without this branch's changes. Unrelated to seccomp on_block.
  • internal/platform/darwin/* build failure — macOS-only packages; expected on a Linux build host.

Docs

  • docs/seccomp.md: new "Syscall Block Actions" table, corrected YAML example default, updated Audit Events to match the real emitted wire format (source=seccomp, typed Fields map, TID semantics on pid).
  • docs/spec.md §13.3: expanded single-line "immediately kills" claim into the four-mode breakdown with the real event JSON.
  • internal/events/types.go: docblock above EventSeccompBlocked listing field keys and meanings so consumers don't have to grep the handler.

🤖 Generated with Claude Code

erans and others added 20 commits April 15, 2026 13:35
sandbox.seccomp.syscalls.on_block is parsed and defaulted ("kill") but
never wired into filter construction — seccomp_linux.go hardcodes
ActErrno(EPERM). Spec wires it through: errno | kill | log | log_and_kill,
with kernel fast-path for silent modes and ActNotify dispatch for audit
modes. Default shifts to "errno" to preserve current runtime behavior
for existing deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rejects unknown on_block values at config load with a clear error.
Default shifts from "kill" to "errno" so upgrades preserve the EPERM
runtime behavior introduced in b670835. Adds OnBlockAction typed
enum + ParseOnBlock helper for use by downstream filter construction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Roborev flagged that the inline comment on Sandbox.Seccomp.Syscalls.OnBlock
still listed only kill|log_and_kill. Updates it to reflect the full set
validated in the previous commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagates sandbox.seccomp.syscalls.on_block from server Config into
the agentsh-unixwrap JSON payload so the wrapper can honor it when
building the seccomp filter. No behavior change yet — wrapper still
hardcodes EPERM until Task 3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the hardcoded ActErrno(EPERM) for blocked syscalls with a
switch on OnBlockAction: errno -> ActErrno, kill -> ActKillProcess,
log/log_and_kill -> ActNotify plus populated Filter.blockList map.
The notify dispatch for log modes lands in Task 5.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Roborev flagged that on_block=log and on_block=log_and_kill install
SCMP_ACT_NOTIFY rules on block-listed syscalls, which turns the main
seccomp filter into a USER_NOTIF filter — but neither
mainFilterUsesUserNotify (wrap.go) nor hasNotifyFeatures (core.go)
recognized this, so:

  - the signal filter could stack a second USER_NOTIF filter on top,
    triggering the Alpine/musl notification-delivery failure documented
    in TestAlpineEnvInject_BashBuiltinDisabled, and
  - ptrace sync was incorrectly skipped for sessions that only used
    block-list logging.

Adds a blockListUsesNotify helper (plain-string, cross-platform safe
because internal/seccomp is linux+cgo only) and wires it into both
gates. Adds four targeted subtests to TestWrap_SignalFilterUsesSessionPolicy
covering log / log_and_kill / errno-with-block / log-with-empty-block.
Second-pass roborev on f6cf673 caught that blockListUsesNotify was
comparing the raw configured names — so a block-list containing only
unknown-on-this-arch syscalls would still flip the gate, causing
ptrace-sync to wait for a READY/GO handshake the wrapper would never
send (seccomp.ResolveSyscalls silently skips unknown names).

Add resolvableBlockListCount, split by build tag: on linux+cgo it
routes through internal/seccomp.ResolveSyscalls; on other targets it
returns 0 (seccomp isn't loaded, so ActNotify rules are never
installed). blockListUsesNotify now requires at least one resolved
name before counting the block-list as a USER_NOTIF feature. Adds a
subtest covering the unresolved-name case.
Second-pass roborev on a4eb566 noted that disabled_by_onblock_log and
disabled_by_onblock_log_and_kill would fail on non-linux or linux
without cgo: resolvableBlockListCount returns 0 there, so the gate
won't close — but the tests unconditionally expect it to.

Gate both subtests with a runtime check on the helper itself
(resolvableBlockListCount(["ptrace"]) == 0 → skip). The !linux/!cgo
behavior is already covered by
enabled_when_onblock_log_with_only_unknown_names, which passes on
every build.
Small wrappers around pidfd_open and pidfd_send_signal used by the
seccomp notify handler to deliver SIGKILL under log_and_kill mode.
*Fn indirections give tests a seam for injecting ESRCH/EPERM/EINVAL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Roborev Low finding on ea1c529: pidfd_open returns the lowest free
descriptor, which can be 0 if stdin is closed. Use GreaterOrEqual
so a successful syscall that happens to land at fd 0 doesn't fail
the test.
handleBlockListNotify runs when a log / log_and_kill blocked syscall
traps: validates the notification is still live, resolves the syscall
name, kills via pidfd for log_and_kill, emits a seccomp_blocked event,
and responds EPERM. Pidfd is wrapped in *Fn seams for test injection
covering ESRCH / EPERM / EINVAL branches. Not yet wired into
ServeNotifyWithExecve — that happens in Task 6.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… fix)

Closes a TOCTOU race roborev flagged on d4e34e9. Between
handleBlockListNotify's initial NotifIDValid check and attemptKill's
pidfd_open, the target can exit and its PID can be recycled by an
unrelated process. SIGKILL would then land on the wrong target.

attemptKill now revalidates the seccomp notif id *after* pidfd_open.
While the id is valid, the kernel guarantees the trapped task has not
been released, so the just-opened pidfd must reference the original
caller. If the recheck fails, we skip SIGKILL and report "killed"
(the original target is gone regardless). The check goes through a
new notifIDValidFn test seam so unit tests can drive the race path
without standing up a real seccomp fd. Added
TestAttemptKill_NotifIDInvalidAfterOpen to cover the branch.

Also fixes the MEDIUM in blocklist_linux_test.go: TestAttemptKill_Success
had its pidfd stub return a fabricated fd (42) which attemptKill would
then unix.Close — potentially closing an unrelated descriptor in the
test process. All attemptKill tests now open /dev/null via
openDevNullFD() so deferred closes hit a real throwaway fd.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tests

Roborev MED on 7eac1e1: attemptKill treated ANY notifIDValidFn error as
"target is gone" and silently downgraded log_and_kill to "killed",
producing inaccurate audit outcomes on unrelated failures (bad listener
fd, transient ioctl error, EINVAL, etc.). Narrowed to isENOENT-only for
the "killed" branch — the canonical "notif id is no longer valid" signal
from the kernel. Any other error now warns and returns "denied" so the
audit record reflects that we could not deliver the signal. Split the
existing TestAttemptKill_NotifIDInvalidAfterOpen into two cases:
_ENOENT (killed, no signal) and _UnexpectedError (denied, no signal).

Also fixes roborev LOW: the test helper opened "/dev/null" literally;
switch to os.DevNull so the constant is sourced from the stdlib.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tify

ServeNotifyWithExecve now accepts a *BlockListConfig and routes
notifications for block-listed syscall numbers to the dedicated
handler. The config is built by the server from Sandbox.Seccomp.Syscalls
— empty for errno/kill modes (kernel fast path), populated for
log/log_and_kill modes. Warns at startup if log mode is selected but
no emitter is wired.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Roborev HIGH on 8f64189: ServeNotifyWithExecve returned immediately
when emit==nil, so a session configured with on_block=log_and_kill but
no emitter wired would not service any seccomp notifications — block-list
enforcement (SIGKILL) silently stops working.

Split the entry guard: return only on nil fd (cannot operate), warn on
nil emit and proceed (enforcement must still run). Downstream emit
consumers (buildUnixSocketEvent, handleBlockListNotify) already nil-check
internally; execve/file handlers manage their own emitters.

Add TestServeNotifyWithExecve_NilEmitterAllowed as a regression guard
against reintroducing the short-circuit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Roborev MEDIUM on a4ce79b: the previous timing-only assertion would
pass even if the emit==nil short-circuit were reintroduced, because the
broken and fixed paths both exit within the 1s deadline.

Install a buffer-backed slog handler for the test duration and assert
that "NotifReceive error" appears in the captured logs. That log only
fires if the loop actually called seccomp.NotifReceive on the pipe fd;
the broken path returns before the loop and never emits it. Verified
by temporarily reintroducing the short-circuit — test fails as expected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spawns a wrapped child with each on_block value and asserts:
- errno: EPERM, child exits 0, no event
- kill: SIGSYS termination, no event
- log: EPERM, exactly one seccomp_blocked event with outcome=denied
- log_and_kill: SIGKILL termination, exactly one event with outcome=killed
- concurrent ptrace storm resolves to exactly one event (no deadlock)
- default block-list resolves on the current arch (regression gate)
Plus non-interference skips for file_monitor/unix_socket/signal where
shared fixtures don't yet exist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aded callers

seccomp_notif.pid is the TID of the thread that made the blocked syscall,
not the TGID (man 2 seccomp, "struct seccomp_notif"). On kernel 6.8
(Ubuntu 24.04 target) pidfd_open(non_TGL_tid, 0) returns ENOENT because
PIDTYPE_TGID is not set on the struct pid for non-leader threads.
PIDFD_THREAD would solve this directly but requires 6.9+.

This caused log_and_kill to fail silently whenever the first notification
came from a non-leader thread: pidfd_open errored, outcome was logged as
"denied", and the target kept running past the supposedly-blocked call.

Fix: parse /proc/<tid>/status Tgid: before opening the pidfd. Resolution
is isolated behind a resolveTGIDFn test seam so unit tests can drive the
non-leader path without spawning threads. Errors on the /proc read: ENOENT
surfaces as unix.ESRCH ("target already gone", skip the kill attempt, tag
outcome killed), non-ESRCH errors fall back to TID so attemptKill still
observes whatever errno the kernel returns.

The audit event's PID field still carries the TID (which thread trapped
is forensically useful on multi-threaded agents); only the pidfd target
changes to TGID.

Regression coverage:

* blocklist_linux_test.go — three unit tests exercise main-thread,
  non-leader-thread, and non-existent TID paths against the real
  /proc filesystem.

* seccomp_onblock_test.go — the ptrace-storm helper now fires from 100
  goroutines each pinned to a distinct OS thread via LockOSThread; no
  ptrace fires from the TGL thread. A pre-fix handler would see ENOENT
  on every pidfd_open and never deliver SIGKILL, so st.Signaled() would
  be false and the test would fail loudly. With the fix, SIGKILL lands
  and at least one event records outcome=killed.

The storm test's assertion stays at >=1 event because the kernel may
queue several notifications from sibling threads before SIGKILL
propagates; exact count is inherently racy. The load-bearing invariant
is sawKilled — that proves TGID resolution worked for a non-leader TID.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on masking

The ptrace-storm helper's goal is to guarantee every ptrace call comes
from a non-leader thread — so a regression to pidfd_open(TID) would
cause the test to fail loudly.

Per-worker runtime.LockOSThread() on its own is not sufficient. When the
main goroutine parks on wg.Wait, the Go scheduler can reuse the M that
was running main (the TGL thread) for a freshly spawned worker. That
worker then locks to the TGL thread and fires ptrace from it, producing
a notification the broken handler CAN deliver SIGKILL to — the test
would pass even against a pre-fix binary.

Fix: lock the main goroutine to the TGL before spawning any workers.
This reserves the TGL so no other goroutine (including worker
goroutines) can execute on it (per go doc runtime.LockOSThread).

Add a belt-and-braces check in each worker: after LockOSThread, if
Gettid() still equals the TGL TID, abort with exit code 3 rather than
silently mask a regression. This protects against future runtime
changes (or unforeseen scheduling paths) that might subvert the
reservation.

Flagged by roborev medium on d8d7f0a.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- docs/seccomp.md: replace "immediately terminates" overview, correct the
  YAML example default (errno, not kill), add the four-value action table
  with kernel mechanism / caller effect / event outcome columns, rewrite
  the Audit Events stanza against the actual emitted wire format (source,
  typed Fields map, TID semantics for pid).

- docs/spec.md §13.3: expand the one-line "immediately kills" claim into
  the four-mode breakdown, include the real event JSON, call out that
  pid is the trapping TID rather than the TGID.

- internal/events/types.go: add a docblock above EventSeccompBlocked
  describing the Fields payload so downstream consumers know what keys
  and types to expect without grepping the handler implementation.

No behavior change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@erans erans merged commit 2b36edd into main Apr 16, 2026
8 of 9 checks passed
@erans erans deleted the feat/seccomp-on-block branch April 16, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant