Skip to content

Commit 40e9bf6

Browse files
authored
feat(policy): add incremental sandbox policy updates (#860)
* feat(policy): add incremental sandbox policy updates Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com> * docs(policies): expand incremental update guidance Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com> * feat(policy): audit incremental updates in gateway logs * docs(policy): quote glob specs in shell examples --------- Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
1 parent e39bb38 commit 40e9bf6

25 files changed

Lines changed: 2923 additions & 205 deletions

File tree

.agents/skills/debug-openshell-cluster/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ Component images (server, sandbox) can reach kubelet via two paths:
182182

183183
**Local/external pull mode** (default local via `mise run cluster`): Local images are tagged to the configured local registry base (default `127.0.0.1:5000/openshell/*`), pushed to that registry, and pulled by k3s via `registries.yaml` mirror endpoint (typically `host.docker.internal:5000`). The `cluster` task pushes prebuilt local tags (`openshell/*:dev`, falling back to `localhost:5000/openshell/*:dev` or `127.0.0.1:5000/openshell/*:dev`).
184184

185-
Gateway image builds now stage a partial Rust workspace from `deploy/docker/Dockerfile.images`. If cargo fails with a missing manifest under `/build/crates/...`, verify that every current gateway dependency crate (including `openshell-driver-kubernetes`) is copied into the staged workspace there.
185+
Gateway image builds now stage a partial Rust workspace from `deploy/docker/Dockerfile.images`. If cargo fails with a missing manifest under `/build/crates/...`, or an imported symbol exists locally but is missing in the image build, verify that every current gateway dependency crate (including `openshell-driver-kubernetes` and `openshell-ocsf`) is copied into the staged workspace there.
186186

187187
```bash
188188
# Verify image refs currently used by openshell deployment

.agents/skills/openshell-cli/SKILL.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -421,10 +421,14 @@ Watch for `deny` actions that indicate the user's work is being blocked by polic
421421

422422
When denied actions are observed:
423423

424-
1. Pull current policy: `openshell policy get work-session --full > policy.yaml`
425-
2. Modify the policy to allow the blocked actions (use `generate-sandbox-policy` skill for content)
426-
3. Push the update: `openshell policy set work-session --policy policy.yaml --wait`
427-
4. Verify: `openshell policy list work-session`
424+
1. Prefer incremental updates for additive network changes:
425+
`openshell policy update work-session --add-endpoint api.github.com:443:read-only:rest:enforce --binary /usr/bin/gh --wait`
426+
`openshell policy update work-session --add-allow 'api.github.com:443:POST:/repos/*/issues' --wait`
427+
2. Use full YAML replacement when the change is broad or touches non-network fields:
428+
`openshell policy get work-session --full > policy.yaml`
429+
Modify the policy to allow the blocked actions (use `generate-sandbox-policy` skill for content)
430+
`openshell policy set work-session --policy policy.yaml --wait`
431+
3. Verify: `openshell policy list work-session`
428432

429433
The user does not need to disconnect -- policy updates are hot-reloaded within ~30 seconds (or immediately when using `--wait`, which polls for confirmation).
430434

@@ -543,6 +547,7 @@ $ openshell sandbox upload --help
543547
| Create with custom policy | `openshell sandbox create --policy ./p.yaml` |
544548
| Connect to sandbox | `openshell sandbox connect <name>` |
545549
| Stream live logs | `openshell logs <name> --tail` |
550+
| Incremental policy update | `openshell policy update <name> --add-endpoint host:443:read-only:rest:enforce --binary /usr/bin/curl --wait` |
546551
| Pull current policy | `openshell policy get <name> --full > p.yaml` |
547552
| Push updated policy | `openshell policy set <name> --policy p.yaml --wait` |
548553
| Policy revision history | `openshell policy list <name>` |

.agents/skills/openshell-cli/cli-reference.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,9 +268,32 @@ View sandbox logs. Supports one-shot and streaming.
268268

269269
## Policy Commands
270270

271+
### `openshell policy update <name>`
272+
273+
Incrementally merge live network policy changes into the current sandbox policy. Multiple flags in one invocation are applied as one atomic batch and create at most one new revision.
274+
275+
| Flag | Default | Description |
276+
|------|---------|-------------|
277+
| `--add-endpoint <SPEC>` | repeatable | `host:port[:access[:protocol[:enforcement]]]`. Adds or merges an endpoint. `access`: `read-only`, `read-write`, `full`. `protocol`: `rest`, `sql`. `enforcement`: `enforce`, `audit`. |
278+
| `--remove-endpoint <SPEC>` | repeatable | `host:port`. Removes the endpoint or just the requested port from a multi-port endpoint. |
279+
| `--add-allow <SPEC>` | repeatable | `host:port:METHOD:path_glob`. Adds REST allow rules to an existing `protocol: rest` endpoint. |
280+
| `--add-deny <SPEC>` | repeatable | `host:port:METHOD:path_glob`. Adds REST deny rules to an existing `protocol: rest` endpoint that already has an allow base. |
281+
| `--remove-rule <NAME>` | repeatable | Deletes a named network rule. |
282+
| `--binary <PATH>` | repeatable | Adds binaries to each `--add-endpoint` rule. Valid only with `--add-endpoint`. |
283+
| `--rule-name <NAME>` | none | Overrides the generated rule name. Valid only when exactly one `--add-endpoint` is provided. |
284+
| `--dry-run` | false | Preview the merged policy locally without sending an update to the gateway. |
285+
| `--wait` | false | Wait for the sandbox to confirm the new policy revision is loaded. |
286+
| `--timeout <SECS>` | 60 | Timeout for `--wait`. |
287+
288+
Notes:
289+
290+
- `--add-allow` and `--add-deny` currently operate only on `protocol: rest` endpoints.
291+
- `--wait` cannot be combined with `--dry-run`.
292+
- Use `policy set` when replacing the full policy or changing static sections.
293+
271294
### `openshell policy set <name> --policy <PATH>`
272295

273-
Update the policy on a live sandbox. Only the dynamic `network_policies` field can be changed at runtime.
296+
Replace the full policy on a live sandbox. Only the dynamic `network_policies` field can be changed at runtime.
274297

275298
| Flag | Default | Description |
276299
|------|---------|-------------|

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

architecture/build-containers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ The incremental deploy (`cluster-deploy-fast.sh`) fingerprints local Git changes
6363
| Changed files | Rebuild triggered |
6464
|---|---|
6565
| Cargo manifests, proto definitions, cross-build script | Gateway + supervisor |
66-
| `crates/openshell-server/*`, `deploy/docker/Dockerfile.images` | Gateway |
66+
| `crates/openshell-server/*`, `crates/openshell-ocsf/*`, `deploy/docker/Dockerfile.images` | Gateway |
6767
| `crates/openshell-sandbox/*`, `crates/openshell-policy/*` | Supervisor |
6868
| `deploy/helm/openshell/*` | Helm upgrade |
6969

architecture/security-policy.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,24 @@ This guarantees that the same logical policy always produces the same hash regar
162162

163163
**Idempotent updates**: `UpdateSandboxPolicy` compares the deterministic hash of the submitted policy against the latest stored revision's hash. If they match, the handler returns the existing version and hash without creating a new revision. The CLI detects this (the returned version equals the pre-call version) and prints `Policy unchanged` instead of `Policy version N submitted`. This makes repeated `policy set` calls safe and idempotent.
164164

165+
### Incremental Merge Updates
166+
167+
`UpdateConfigRequest.merge_operations` supports batched incremental changes to the dynamic `network_policies` section. The CLI exposes this as `openshell policy update`.
168+
169+
Supported first-pass operations:
170+
171+
- `--add-endpoint host:port[:access[:protocol[:enforcement]]]`
172+
- `--remove-endpoint host:port`
173+
- `--remove-rule <name>`
174+
- `--add-allow host:port:METHOD:path_glob`
175+
- `--add-deny host:port:METHOD:path_glob`
176+
177+
`--add-allow` and `--add-deny` target existing `protocol: rest` endpoints only. `--binary` may be repeated with `--add-endpoint`, and `--rule-name` is allowed only when exactly one `--add-endpoint` is present.
178+
179+
Each `openshell policy update` invocation is atomic at the revision level: the CLI sends one `merge_operations` batch, the server merges the whole batch into the latest policy, validates the result, and persists at most one new revision. Concurrency is handled with optimistic retries on the `(sandbox_id, version)` uniqueness boundary. If another writer wins first, the server refetches the latest policy, reapplies the full batch, revalidates it, and retries. This preserves batch atomicity without serializing all sandbox policy writes behind a sandbox-global mutex.
180+
181+
The gateway emits per-sandbox OCSF `CONFIG:*` audit lines when incremental merge operations are applied and when draft chunks are approved or removed. These audit lines are streamed through the existing gateway log path, so operators can inspect the exact logical mutation that produced a policy revision without waiting for the sandbox poll loop to reload that revision.
182+
165183
### Policy Revision Statuses
166184

167185
| Status | Meaning |
@@ -206,9 +224,20 @@ Failure scenarios that trigger LKG behavior include:
206224

207225
### CLI Commands
208226

209-
The `openshell policy` subcommand group manages live policy updates:
227+
The `openshell policy` subcommand group manages live policy updates through full replacement (`policy set`) and incremental merges (`policy update`):
210228

211229
```bash
230+
# Merge endpoint/rule changes into the current sandbox policy
231+
openshell policy update <sandbox-name> \
232+
--add-endpoint api.github.com:443:read-only:rest:enforce \
233+
--binary /usr/bin/gh \
234+
--wait
235+
236+
# Add a REST allow rule to an existing endpoint
237+
openshell policy update <sandbox-name> \
238+
--add-allow api.github.com:443:POST:/repos/*/issues \
239+
--wait
240+
212241
# Push a new policy to a running sandbox
213242
openshell policy set <sandbox-name> --policy updated-policy.yaml
214243

@@ -255,6 +284,7 @@ Both `set` and `delete` require interactive confirmation (or `--yes` to bypass).
255284

256285
When a global policy is active, sandbox-scoped policy mutations are blocked:
257286
- `policy set <sandbox>` returns `FailedPrecondition: "policy is managed globally"`
287+
- `policy update <sandbox>` returns `FailedPrecondition: "policy is managed globally"`
258288
- `rule approve`, `rule approve-all` return `FailedPrecondition: "cannot approve rules while a global policy is active"`
259289
- Revoking a previously approved draft chunk is blocked (it would modify the sandbox policy)
260290
- Rejecting pending chunks is allowed (does not modify the sandbox policy)
@@ -270,7 +300,7 @@ See [Gateway Settings Channel](gateway-settings.md#global-policy-lifecycle) for
270300

271301
When `--full` is specified, the server includes the deserialized `SandboxPolicy` protobuf in the `SandboxPolicyRevision.policy` field (see `crates/openshell-server/src/grpc.rs` -- `policy_record_to_revision()` with `include_policy: true`). The CLI converts this proto back to YAML via `policy_to_yaml()`, which uses a `BTreeMap` for `network_policies` to produce deterministic key ordering. See `crates/openshell-cli/src/run.rs` -- `policy_to_yaml()`, `policy_get()`.
272302

273-
See `crates/openshell-cli/src/main.rs` -- `PolicyCommands` enum, `crates/openshell-cli/src/run.rs` -- `policy_set()`, `policy_get()`, `policy_list()`.
303+
See `crates/openshell-cli/src/main.rs` -- `PolicyCommands` enum, `crates/openshell-cli/src/run.rs` -- `policy_update()`, `policy_set()`, `policy_get()`, `policy_list()`.
274304

275305
---
276306

crates/openshell-cli/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ pub mod auth;
1212
pub mod bootstrap;
1313
pub mod completers;
1414
pub mod edge_tunnel;
15+
pub(crate) mod policy_update;
1516
pub mod run;
1617
pub mod ssh;
1718
pub mod tls;

crates/openshell-cli/src/main.rs

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,8 @@ const POLICY_EXAMPLES: &str = "\x1b[1mALIAS\x1b[0m
254254
\x1b[1mEXAMPLES\x1b[0m
255255
$ openshell policy get my-sandbox
256256
$ openshell policy set my-sandbox --policy policy.yaml
257+
$ openshell policy update my-sandbox --add-endpoint api.github.com:443:read-only:rest:enforce
258+
$ openshell policy update my-sandbox --add-allow 'api.github.com:443:GET:/repos/**'
257259
$ openshell policy set --global --policy policy.yaml
258260
$ openshell policy delete --global
259261
$ openshell policy list my-sandbox
@@ -1438,6 +1440,54 @@ enum PolicyCommands {
14381440
timeout: u64,
14391441
},
14401442

1443+
/// Incrementally update policy on a live sandbox.
1444+
#[command(help_template = LEAF_HELP_TEMPLATE, next_help_heading = "FLAGS")]
1445+
Update {
1446+
/// Sandbox name (defaults to last-used sandbox).
1447+
#[arg(add = ArgValueCompleter::new(completers::complete_sandbox_names))]
1448+
name: Option<String>,
1449+
1450+
/// Add or merge an endpoint: host:port[:access[:protocol[:enforcement]]].
1451+
#[arg(long = "add-endpoint")]
1452+
add_endpoints: Vec<String>,
1453+
1454+
/// Remove an endpoint: host:port.
1455+
#[arg(long = "remove-endpoint")]
1456+
remove_endpoints: Vec<String>,
1457+
1458+
/// Add a REST allow rule: host:port:METHOD:path_glob.
1459+
#[arg(long = "add-allow")]
1460+
add_allow: Vec<String>,
1461+
1462+
/// Add a REST deny rule: host:port:METHOD:path_glob.
1463+
#[arg(long = "add-deny")]
1464+
add_deny: Vec<String>,
1465+
1466+
/// Remove a network rule by name.
1467+
#[arg(long = "remove-rule")]
1468+
remove_rules: Vec<String>,
1469+
1470+
/// Add binaries to each --add-endpoint rule.
1471+
#[arg(long = "binary", value_hint = ValueHint::FilePath)]
1472+
binaries: Vec<String>,
1473+
1474+
/// Override the generated rule name when exactly one --add-endpoint is provided.
1475+
#[arg(long = "rule-name")]
1476+
rule_name: Option<String>,
1477+
1478+
/// Preview the merged policy without sending it to the gateway.
1479+
#[arg(long)]
1480+
dry_run: bool,
1481+
1482+
/// Wait for the sandbox to load the policy revision.
1483+
#[arg(long)]
1484+
wait: bool,
1485+
1486+
/// Timeout for --wait in seconds.
1487+
#[arg(long, default_value_t = 60)]
1488+
timeout: u64,
1489+
},
1490+
14411491
/// Show current active policy for a sandbox or the global policy.
14421492
#[command(help_template = LEAF_HELP_TEMPLATE, next_help_heading = "FLAGS")]
14431493
Get {
@@ -1988,6 +2038,37 @@ async fn main() -> Result<()> {
19882038
.await?;
19892039
}
19902040
}
2041+
PolicyCommands::Update {
2042+
name,
2043+
add_endpoints,
2044+
remove_endpoints,
2045+
add_allow,
2046+
add_deny,
2047+
remove_rules,
2048+
binaries,
2049+
rule_name,
2050+
dry_run,
2051+
wait,
2052+
timeout,
2053+
} => {
2054+
let name = resolve_sandbox_name(name, &ctx.name)?;
2055+
run::sandbox_policy_update(
2056+
&ctx.endpoint,
2057+
&name,
2058+
&add_endpoints,
2059+
&remove_endpoints,
2060+
&add_deny,
2061+
&add_allow,
2062+
&remove_rules,
2063+
&binaries,
2064+
rule_name.as_deref(),
2065+
dry_run,
2066+
wait,
2067+
timeout,
2068+
&tls,
2069+
)
2070+
.await?;
2071+
}
19912072
PolicyCommands::Get {
19922073
name,
19932074
rev,

0 commit comments

Comments
 (0)