feat(net): automatic gateway configuration — port forwarding (PCP/NAT-PMP/UPnP) + private-domain DNS (RFC 2136)#3306
Conversation
429b793 to
1e8c0e2
Compare
waterplea
left a comment
There was a problem hiding this comment.
A few small comments on the UI, otherwise LGTM from the frontend side
Companion to the DNS-injection half of Start9Labs/start-os#3306: a new 'DNS Records' page documenting how trusted devices inject records over RFC 2136 and how to view/manage them, plus the per-device 'Allow DNS injection' toggle (default off) noted on the Devices page.
34ab6d8 to
57a0ac4
Compare
|
All four inline points from the changes-requested review are resolved (replied threaded on each):
Also pushed since, from the Matrix discussion:
|
c030e54 to
10f1988
Compare
Devices come in two kinds (Server = StartOS box with gateway autoconfig, Client = plain peer); document the kind selector, Server DNS-injection/auto-port-forward toggles, and promote/demote. DNS records and port forwards now split into Manual and Automatic tables. Add the device add --kind flag to the CLI reference. Companion to Start9Labs/start-os#3306.
Companion to Start9Labs/start-os#3306 net tunnel update + the gateways UI action.
…tartTunnel server) Open the port a public address needs automatically instead of leaving it as a manual step (router admin panel / StartTunnel UI). A StartOS server opens its public ports by speaking a port-control protocol to its gateway — a home router or a StartTunnel — so one code path covers both. Priority: PCP (RFC 6887) → NAT-PMP → UPnP IGD. Client: - net/port_map.rs: a PortMapController opens/withdraws mappings, driven by the gateway-aware, reference-counted forward reconcile in net/forward.rs. Tries PCP then NAT-PMP (crab_nat, pure Rust, no transitive deps), then UPnP IGD (net/upnp.rs). Candidate gateways are the interface's NM default gateway and each subnet's .1 (reaching a StartTunnel over WireGuard). Mappings are renewed with the forward and withdrawn when the address is disabled/deleted. - net/gateway.rs: WAN-IP detection tries UPnP GetExternalIPAddress before the echoip probe; a private/CGNAT result falls through to echoip. StartTunnel server: - tunnel/pcp.rs: PCP server (UDP 5351). - tunnel/igd.rs: UPnP IGD (SSDP + device description + SCPD + SOAP) fallback. - Both SO_BINDTODEVICE-bound to the WireGuard interface and only honor configured peers. PCP maps the requesting host, so a peer can only forward to itself; the UPnP server enforces the same by ignoring NewInternalClient. Mappings land in the existing port_forwards table. New deps: crab_nat (PCP/NAT-PMP), igd-next (UPnP; attohttpc pulled without TLS), xmltree (parse incoming SOAP, pinned 0.10 to dedupe). SO_BINDTODEVICE is Linux-only and cfg-gated so the core lib still builds for apple-darwin.
When a private domain is enabled on a gateway, push an A record (domain -> this host's IP on that gateway's subnet) to the gateway's DNS server via RFC 2136 DNS UPDATE, so LAN devices not using StartOS's resolver can resolve it; withdraw on disable/delete. Bound to our address on the gateway so the server can authorize by source IP. Best-effort, reconciled off the net_iface watch, mirroring the dns controller's add/gc API. Server-side acceptance (StartTunnel + StartWRT) lands next, sharing a handler in core.
DnsInjector: in-memory store of injected records + per-gateway plug-ins (a source-IP authorizer for the per-device 'allow DNS injection' toggle, and an on_change persistence hook). InjectingHandler wraps a forwarding Catalog: a query for an injected name is answered locally, an authorized UPDATE mutates the store, everything else forwards unchanged. Manual upsert/delete bypass auth (admin CRUD). Shared by StartTunnel (this crate) and StartWRT (imports core).
Wire the shared DnsInjector into the per-subnet DNS proxy: WgConfig gains allow_dns_injection (default off, per-device toggle); a device whose IP is allowed may inject records via DNS UPDATE, which the proxy answers authoritatively and persists to db.dns_records (DnsRecordEntry). The authorizer reads a live allowed-IP set so a toggle change applies without rebuilding. Adds InjectedRecord <-> text-form conversions in core (A/AAAA/CNAME/TXT).
…ngs) - device set-dns-injection: per-device allow toggle (updates the live allowed set) - dns list/add/remove: view and manually add/replace/delete records - TS bindings (DnsRecordEntry, *Params) regenerated + synced to sdk osBindings - start-tunnel manpages + i18n about-strings (all 5 locales)
- New DNS nav/route: table of injected+manual records (view) with add (name/ type/value/ttl dialog) and delete, watching db.dnsRecords via patch-db. - Devices page: per-device 'Allow/Disallow DNS injection' action (default off, confirm prompt noting trust). - ApiService trio (abstract/live/mock) + data-model gain dns.add/dns.remove and device.set-dns-injection; mock dnsRecords + allowDnsInjection. NOTE: web deps not installed in this slot; not yet built/typechecked locally — follows existing patterns against the regenerated T.Tunnel.* bindings. Needs a local npm ci + SDK bundle rebuild + check:tunnel to confirm.
Tunnel web now type-checks (check:tunnel) and builds (build:tunnel) clean.
A client must confirm a gateway speaks the HOSTNAME extension before sending OPTION_HOSTNAME (224, a PCP Private-Use option code that could collide with another vendor). Interim safety for the Private-Use period — once HOSTNAME has an IANA code this is moot, so it lives in the implementation, not the RFC. - pcp/capability.rs: shared Start9 capability option (code 225, magic b"ST9\x01") + codec, the single compile-time source of truth for every gateway and client. - server: answer ANNOUNCE (opcode 0) with the marker for any peer, before the MAP-only check. StartWRT inherits this via the generic handle<B>() — no new GatewayBackend method. - client: raw-UDP ANNOUNCE probe (crab_nat has none) + per-gateway support cache gating OPTION_HOSTNAME; the MAP success arm also requires the HOSTNAME echo. Verified by an adversarial review (wire format vs RFC 6887, regression, security); folded in its two findings (trace the gated skip; retransmit a garbled probe).
drbonez-chosen marker bytes. Same 4-byte length, so the wire format and the 8-byte option are unchanged; only the value the gateway emits and the client matches on.
- transparent.rs: socket2's set_ip_transparent_v4 (IP_TRANSPARENT) exists only on Linux/Android, so the apple-darwin build failed to compile. The transparent SNI egress only runs on the Linux gateway, so cfg-gate the real impl to Linux and add a non-Linux error stub (same signature; never executed off-Linux). - rfc2136 test: held_name_is_nodata_not_forwarded touches DnsInjector's SyncMutex, whose lock path now spawns a watchdog (tokio::spawn) after the #3085 refactor the branch rebased onto — so it needs a runtime. Make it a #[tokio::test].
probe_announce bound 0.0.0.0, so the ANNOUNCE could egress the wrong interface and be dropped (e.g. by WireGuard cryptokey routing) while the crab_nat MAP path — which binds the local IP — worked. Bind local_ip like MAP so the probe rides the same path to the gateway and the reply routes back. Also log the gateway's ANNOUNCE reply at debug to confirm receipt during testing.
check_dns only iterated gw_ip_info.dns_servers, populated solely from DHCPv4 option 6 — empty on a static WireGuard link, so the loop never ran and it fell through to Ok(false) despite the record resolving. The private domain is injected at / served by the subnet .1 (candidate_gateways), which it never queried. Union the DHCP resolvers with candidate_gateways(gw_info) so it asks where the record actually lives — the same server nslookup @<tunnel .1> hits.
…llback poll_ip_info filled dns_servers only from DHCP4 options — empty for a static WireGuard link — so check_dns (and any dns_servers consumer) never saw the resolver the imported config's `DNS =` declares. Read the applied nameservers from the NM Ip4Config/Ip6Config so dns_servers reflects the tunnel's resolver (for a StartTunnel link, the in-tunnel .1 that injects/serves private domains). This is the real fix for the check_dns false-negative; reverts the e6a1651 candidate_gateways/.1 fallback (guessing the .1 was the wrong solution). Also mark the imported resolver preferred-but-not-exclusive (positive dns-priority below the LAN default) so it wins yet others stay a fallback.
…(backend)
- WgConfig.kind: WgClientKind {Client,Server}, stored & sticky. generate() sets a
Server's autoconfig flags on, a Client's off. m_02 backfills existing clients
(server iff both flags already on, else client) without touching the flags.
- PortForward::Dnat / SniRoute gain an explicit `auto` flag (PCP/UPnP set true,
manual false) so the UI Manual/Automatic split isn't a label heuristic. m_03
backfills it from the PCP/UPnP label.
- set_device_kind RPC (device set-kind): promote/demote, resetting both flags to
the kind's default. AddDeviceParams gains kind. i18n + ts surface follow.
Bindings regen + web + docs land next.
make ts-bindings output: new WgClientKind + SetDeviceKindParams, and kind/auto fields on WgConfig/AddDeviceParams/PortForward/SniRoute. Adds SetDeviceKindParams to the tunnel export list.
…mote; WAN fix - Devices: split into Servers and Clients tables. Servers expose inline DNS-injection and auto-port-forward toggles; Clients have no autoconfig. Add form picks a kind (Servers default both flags on); overflow menu promotes/demotes (setDeviceKind). - Port-forwards: split Manual/Automatic by the explicit `auto` flag; the Automatic table drops the Label column + label-edit action (gateway-owned). - DNS: split Manual (source null) / Automatic (injected) tables. - WAN IP select: non-null sentinel object + identityMatcher so Default renders instead of blank. - Service layer: setDeviceKind across api/live/mock; mocks carry kind + auto. check:tunnel + check:i18n green; prettier-formatted.
net tunnel update <id> <config> replaces the NM connection behind an existing gateway interface without churning its identity: delete the old connection and re-import the new one onto the SAME interface. The device-watch loop only blanks ip_info for an absent device (never forgets it), so the gateway id and everything keyed to it (forwards, private/public domains) survive the swap. Primary use: re-issue a config that now carries a DNS = line. Extracts the shared import_wireguard helper (write + nmcli import + dns-priority) used by both add and update. UpdateTunnelParams binding regenerated.
Per-gateway 'Update config' in the gateways table (wireguard only) opens the add-gateway config paste/upload dialog and calls net.tunnel.update via a new updateTunnelConfig API method — re-issuing a config in place without churning the gateway. (updateTunnel stays the rename op -> net.gateway.set-name.) Adds the 'Update config' i18n key across all five locales.
RFC 2136 UPDATEs were authorized by source IP only, so any co-located service that could emit from the server's tunnel IP could forge DNS injections into a StartTunnel/StartWRT gateway's resolver. Add TSIG (RFC 8945, HMAC-SHA256) on every UPDATE, keyed off a per-device key derived (HKDF-SHA256) from that device's WireGuard PSK: - Verify side (shared rfc2136.rs): the injector takes a per-source key lookup; the OpCode::Update handler now requires a valid, in-window TSIG before touching the store (hard cutover, fail-closed on any error). - Sign side (StartOS): each UPDATE is signed with the key derived from the PSK NetworkManager holds for that gateway (root-only GetSecrets), so a sandboxed service can't read it and can't forge a signature. - Keys are derived identically on both sides (fixed key name, fudge 300s) and isolated per device, so one device can't sign as another. A transient PSK-lookup failure is not cached as keyless (it would have wedged an expected-signed gateway on unsigned/Refused until the next network change); only definitive results are cached. TSIG prevents forgery but not replay within the fudge window; that's bounded to the sender's own idempotent, re-asserted records and accepted as a documented limitation. Co-authored-by: drbonez
NetworkManager's IP6Config interface has no `NameserverData` property (that's IP4Config-only); it exposes `Nameservers` (aay, raw 16-byte addresses). The bogus property read failed with `InvalidArgs: No such property` and, via `?`, aborted the whole IP-info poll — taking IPv4 DNS down with it. Read `Nameservers` and parse each 16-byte entry as an Ipv6Addr; subscribe to the matching change signal.
…om Add button Mirror the StartOS UI interfaces structure: each table sits in a card whose header carries the table title and its action button (instead of a bare <h3> and an Add button crammed into a table header cell). The Clients table gains its own Add button, and the tables are no longer width-constrained. The add-device dialog drops its Kind selector; the kind is inferred from which Add button opened it (Servers -> server, Clients -> client) and passed through the dialog data.
…ic label The subnet WAN selector used a bare null item, which tuiSelect skips, so its default choice rendered blank. Wrap the default in an object (shared WanItem + toWanItems + matchWan helpers, lifted to wan.ts) like the device selector already does. Reword the default per context: "Use System Default" (subnet, inherits the system default WAN) and "Use Subnet Default" (device, inherits its subnet's WAN). Also card-wrap the subnets table for consistency with the other tables.
… the container port The auto-port-mapping reconcile sent the forward target's container port as the PCP MAP internal port (e.g. 8333 -> 58333). The StartTunnel gateway forwards to StartOS's LAN IP at the port StartOS listens on, which is the external port; StartOS's own nftables rule then DNATs that to the container target. So the PCP internal port must be `self.external`, matching what the SNI/hostname path already does.
259f3a9 to
0d01808
Compare
…urce shows device name Automatic port forwards are created on demand by a device's PCP request, so an enable/disable toggle on them is meaningless — remove that column from the Automatic table (the Manual table keeps its toggle). In the DNS Automatic table, resolve the injecting device's IP to its friendly name (falling back to the IP when the device is unknown).
…etup messaging The private-domain and clearnet setup dialogs only described the manual paths (point the gateway's DNS at this server; create a port-forward rule). Mention the automatic alternatives too: enabling DNS Injection for the device on a StartTunnel gateway, and automatic port forwarding (UPnP / NAT-PMP / PCP) — so users know they don't have to configure either by hand.
Updating a WireGuard gateway's config tears its tunnel down. When you update the gateway carrying the request, that drops the request's own transport, and the RPC handler runs inline on the per-connection task — so it's cancelled mid-sequence, after `nmcli connection delete` but before the re-import. Worse, `invoke` sets `kill_on_drop`, so the in-flight nmcli child is killed too. The connection ends up deleted and never re-created — recovered only by a reboot. Run the destructive delete + re-import in a detached `tokio::spawn` and await its handle. Dropping a bare JoinHandle on cancellation detaches (does not abort) the task, so the sequence completes server-side and the device-watch loop re-detects the re-imported interface — no reboot needed. The non-cancelled (different-gateway) case still returns the real result.
Deleting a gateway you're connected through severs its tunnel — the request's own transport — cancelling the handler after `settings.delete()` but before the `wait_for` + `forget()`. That leaves the NM connection deleted while the gateway entry is never forgotten: a half-deleted gateway recovered only by a reboot. Detach the delete + wait + forget into a spawned task (receiver is now `&Arc<Self>` so the task can call `forget`) so it runs to completion even when the handler future is dropped — same pattern as update_tunnel.
…parser Both add_tunnel and update_tunnel now parse the WireGuard config and build the NetworkManager connection settings through one shared path (WgConfig::parse + to_nm_settings), so an added gateway and an updated one are identical by construction — no parity drift between the two. This retires the nmcli `connection import` helper (import_wireguard / sanitize_config). - add: NetworkManager.AddAndActivateConnection from the built settings. - update: Update2(to-disk) + Device.Reapply on the existing connection, keeping its uuid. The wg device is never deleted, so updating the gateway that carries the request doesn't drop its own transport, and a cancelled update leaves the gateway on its old (or new) config — never half-deleted. Replaces the earlier detached-task workaround. Also reverts the delete_iface detach: the delete flow's cancel/crash safety is a separate effort (declarative intent + reconcile), to land in its own PR. Settings format validated against a real NM 1.52 connection (ipv4.dns is au, little-endian octets; reapply applies DNS/endpoint/key changes without dropping the interface). Parser + builder covered by unit tests.
waterplea
left a comment
There was a problem hiding this comment.
Pushed a few cleanup changes
|
Full VM test of the in-place WireGuard update + unified add/update parser — ran a startbox built from this branch on a StartOS VM (beta-9 → this build via A/B + reboot, NetworkManager 1.52), exercising both paths end-to-end against real NM.
One transient worth noting: an Separately (for the follow-up delete-flow PR, out of scope here): |
The HTTP->HTTPS redirect is the only forward where external != internal. Routing it through InterfacePortForwardController also created an nft LAN forward, which is wrong for the OS (StartOS serves 80/443 itself, there is no container to DNAT to) and forced the upstream-map internal port to do double duty -- the cause of the 80->443 regression. Move the redirect to a direct PortMapController mapping (external 80 -> internal 443), tracked per host so it's withdrawn when 443 stops being publicly exposed. Service forwards stay external==internal on the forward path.
What
Automatically configure the gateway for the things a StartOS service needs, instead of leaving them as manual steps. One code path works whether the gateway is a home router or a StartTunnel. Two areas:
1. Port forwarding — PCP (RFC 6887) → NAT-PMP → UPnP IGD
net/port_map.rs): when a public address needs a forward, open it on the gateway via PCP, then NAT-PMP (crab_nat), then UPnPAddPortMapping(igd-next); withdraw on disable/delete; renewed/reconciled with the forward. Candidate gateways = the interface's NM gateway + each subnet's.1. Handles port ranges (feat(sdk,core): MultiHost.bindPortRange for contiguous TCP+UDP port ranges #3270) up to a cap.net/gateway.rs): tries UPnPGetExternalIPAddressbefore the echoip probe; private/CGNAT result falls back to echoip.tunnel/pcp.rs, UDP 5351) + a UPnP IGD (tunnel/igd.rs, SSDP + SCPD + SOAP), bothSO_BINDTODEVICE-bound to the wg interface and peer-scoped (a peer can only forward to its own tunnel IP). Note:crab_natis client-only, so the PCP server is hand-rolled (no server crate exists).2. Private-domain DNS — RFC 2136 (DNS UPDATE) (in progress)
net/dns_update.rs, done): when a private domain is enabled on a gateway, push anArecord (domain → our IP on that gateway's subnet) to the gateway's DNS via DNS UPDATE, bound to our address so the gateway authorizes by source IP; withdraw on disable/delete. Best-effort, reconciled off thenet_ifacewatch.core(used by StartTunnel here, and by StartWRT which importscore). Accepts UPDATE from an authorized source IP (a per-device "allow DNS injection" toggle, default off — trust the device), answers injected names, forwards the rest. Plus a per-device toggle + a view/add/edit/delete UI.Notes
crab_nat(PCP/NAT-PMP, pure Rust, no transitive deps),igd-next(UPnP;attohttpcpulled without TLS),xmltree(SOAP parse, pinned0.10). DNS UPDATE uses the in-treehickory-proto.SO_BINDTODEVICEis Linux-only and cfg-gated for the apple-darwin build.Testing
cargo check --libclean;cargo test --lib net:: tunnel::passing (port-forward wire contracts, PCP MAP encoding, range allocation).