-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Problem
ADAM's WebRTC conferencing uses a full mesh topology — each participant maintains a direct connection to every other participant. Connection count grows quadratically:
| Participants | Connections per peer | Total connections |
|---|---|---|
| 2 | 1 | 2 |
| 4 | 3 | 12 |
| 6 | 5 | 30 |
| 8 | 7 | 56 |
At 6-8 participants, each peer must encode and upload their media stream N-1 times. Bandwidth and CPU requirements hit a hard ceiling. This was observed in practice during a recent 8-person call.
Proposed Solution: Embedded SFU (Selective Forwarding Unit)
An SFU receives each participant's media stream once, then selectively forwards it to all other participants. Each peer only uploads once regardless of group size. This is the standard architecture used by Jitsi, LiveKit, mediasoup, etc.
Key Design: SFU in the ADAM Executor
Rather than requiring external infrastructure, embed SFU capability directly in the ADAM executor:
- Every ADAM agent gains SFU capability by default
- No external service dependency
- Shares the executor's async runtime, identity system, and neighbourhood membership
- The executor already manages networking and auth — the SFU piggybacks on this
Recommended library: str0m — a Sans I/O Rust WebRTC implementation with an explicit SFU example. Sans I/O fits well with the executor's architecture — no internal threads or hidden async tasks, all operations driven by the caller. webrtc-rs is an alternative but is heavier.
Designated SFU Peer per Neighbourhood
Not every peer in a call acts as the SFU. The neighbourhood designates one peer:
- Cloud Gateway — for gateway-connected neighbourhoods, the gateway acts as default SFU (always-on, server-grade bandwidth)
- Designated peer — neighbourhood admin sets
sfu_peerDID in Social DNA. Simplest self-sovereign model. - Mesh fallback — if SFU unavailable or ≤4 participants, fall back to direct mesh (current behaviour, zero regression)
Call Flow
Initiator creates call link in neighbourhood
→ Peers query neighbourhood for SFU peer
→ If SFU available: each peer connects once to SFU
→ SFU forwards streams to all peers
→ If SFU unavailable + ≤4 peers: mesh fallback
→ If SFU unavailable + >4 peers: warn, attempt mesh
Signalling (SDP offer/answer, ICE candidates) routes through existing neighbourhood communication — no new signalling infrastructure needed.
Executor Changes
New Module: sfu/
rust-executor/src/
├── sfu/
│ ├── mod.rs // Module entry, SFU lifecycle
│ ├── server.rs // str0m-based WebRTC SFU server
│ ├── room.rs // Room/session management per neighbourhood
│ └── relay.rs // Media relay & selective forwarding logic
GraphQL API
type Mutation {
sfuStartRoom(neighbourhoodUrl: String!, roomId: String!): SfuRoom!
sfuStopRoom(roomId: String!): Boolean!
callJoin(neighbourhoodUrl: String!, roomId: String!): CallSession!
callLeave(roomId: String!): Boolean!
}
type Query {
sfuRooms: [SfuRoom!]!
sfuPeerForNeighbourhood(neighbourhoodUrl: String!): String
}
type Subscription {
callParticipants(roomId: String!): CallParticipantEvent!
callStreams(roomId: String!): CallStreamEvent!
}Social DNA Extension
{
"sfu": {
"mode": "designated",
"designatedPeer": "did:key:z6Mk...",
"fallback": "mesh",
"maxMeshParticipants": 4
}
}Modes: "gateway" | "designated" | "mesh" (default, current behaviour)
Flux UI Changes
- SFU mode indicator during calls
- Participant grid scaling for >8 participants (speaker view, active speaker detection)
- Quality selector (auto/high/medium/low) — SFU enables server-side bandwidth adaptation
- Neighbourhood settings panel for SFU peer configuration
- Simulcast support: client sends 3 layers (720p/360p/180p), SFU selects per recipient
Implementation Phases
Phase 1: Core SFU in Executor
- Embed
str0mSFU in the executor - GraphQL API for room management
- Social DNA
sfuconfiguration - Designated peer mode only
Phase 2: Flux Integration & Cascaded SFU
- Call UI updates for SFU mode
- Settings panel for SFU configuration
- Simulcast support
- Fallback logic (SFU → mesh)
- Cascaded SFU mode (multi-node cluster, pipe transports)
- Capability-based SFU peer election (peers advertise bandwidth/uptime/CPU)
Phase 3: Advanced
- Cross-cluster SFU (SFU nodes spanning multiple neighbourhoods)
- Recording (SFU has all streams — trivial to record)
- Breakout rooms (multiple SFU rooms per neighbourhood)
- Screen sharing optimisation
- E2E encryption via Insertable Streams / SFrame
- WE module extraction
Security Considerations
- Trust model: SFU peer sees media in cleartext. For cloud gateway this is accepted. For designated peers, neighbourhood members implicitly trust that peer.
- E2E encryption: possible via Insertable Streams/SFrame (Phase 3) — SFU forwards encrypted frames it cannot decrypt.
- Authentication: peers authenticate to SFU via ADAM agent DID; SFU verifies neighbourhood membership before admitting.
- Abuse prevention: SFU operator can set limits (max participants, max bitrate) via Social DNA.
Open Questions
-
NAT traversal — ✅ Resolved: Use ADAM/Flux's existing centralised TURN & STUN servers for now. The SFU peer will use the same ICE infrastructure that mesh calls already use. Decentralised TURN alternatives can be explored later but are not blocking.
-
Resource compensation — ✅ Out of scope: HoloFuel compensation for SFU nodes is deferred. Relates to the broader x402/mutual credit work but is not required for initial implementation.
-
Live migration — ✅ Out of scope: Tracked separately in feat: Seamless SFU ↔ mesh live migration during calls #708. Initial implementation uses manual SFU peer designation (defaulting to gateway peer). Participants re-join if topology changes. Seamless migration is a future enhancement.
-
str0mreadiness — Sans I/O is elegant but less battle-tested than Go SFUs (Pion/LiveKit). Needs evaluation under real load. The existing chat example demonstrates the pattern works. -
SFU placement: executor vs link language — Where should the SFU live architecturally?
Option A: In the executor (as currently proposed)
- ✅ Access to the full async runtime, identity system, neighbourhood membership — no bridging needed
- ✅ Simpler implementation — SFU is a Rust module alongside existing executor services
- ✅ GraphQL API integrates naturally with the existing schema
- ✅ Works for all link languages automatically — any neighbourhood gets SFU capability
- ❌ Couples media infrastructure to the executor — every executor ships SFU code whether it's needed or not
- ❌ Harder to swap SFU implementations per neighbourhood (e.g. one NH wants recording, another wants minimal)
Option B: In the link language, as a telepresence API extension
- ✅ Follows ADAM's abstraction model — media handling is already a language concern (the telepresence API is part of the language interface)
- ✅ Different neighbourhoods can use different SFU implementations via different link languages
- ✅ Keeps the executor lean — SFU capability is opt-in per language
- ✅ Language-level SFU could be implemented in WASM (ties into feat: WASM-based language execution runtime #692), enabling sandboxed media processing
- ❌ Link languages run in JS/WASM isolates — embedding
str0m(Rust) requires either FFI bridging or a separate WASM-compiled SFU - ❌ The language would need to open server sockets and manage WebRTC connections — currently languages don't have this level of network access
- ❌ More complex signalling path — SFU in the language needs to communicate back to the executor for ICE/TURN credentials and peer authentication
- ❌ Per-language implementation burden — every link language that wants SFU must implement it
Option C: Hybrid — executor provides SFU primitives, language controls policy
- The executor embeds
str0mand exposes SFU room management as a runtime service (like Holochain or SurrealDB) - The link language's telepresence API gains new methods:
requestSfu(),setSfuPeer(),getSfuConfig() - The language decides when and how to use the SFU (threshold, peer selection policy), but the executor does the heavy lifting
- This mirrors how languages use Holochain — they don't embed a conductor, they call the executor's Holochain service
Recommendation: Option C (hybrid). The SFU is infrastructure (like Holochain), not application logic. It belongs in the executor as a service. But the telepresence API in the language interface should be extended so languages can control SFU behaviour — when to activate, which peer, quality settings. This keeps the abstraction clean while avoiding the practical problems of running a Rust SFU inside a JS/WASM isolate.
Cascaded SFU (Multi-Node)
In addition to the single-SFU modes (gateway/designated/mesh), a cascaded mode enables multiple executor nodes to cooperate as an SFU cluster within a neighbourhood call.
Mode: "cascaded"
A new topology mode alongside "gateway", "designated", and "mesh". In cascaded mode:
- Multiple nodes advertise SFU capability via neighbourhood signalling (
sfu-announcemessages) - Each SFU node accepts a subset of participants (up to
maxParticipantsPerNode) - Inter-SFU relay via pipe transports: SFU nodes establish str0m peer connections between each other, selectively forwarding tracks that remote participants have subscribed to
- Participants connect to their nearest/least-loaded SFU node; the SFU cluster handles cross-node media routing transparently
SFU Cluster Discovery
- Nodes that can serve as SFU broadcast
sfu-announcemessages through neighbourhood signalling - Each announce includes: DID, current participant count, capacity hint
sfu-pipe-offer/sfu-pipe-answermessages establish pipe transports between SFU nodessfu-leavemessage when an SFU node departs the cluster
Social DNA Extensions
{
"sfu": {
"mode": "cascaded",
"sfuPeers": ["did:key:z6Mk...", "did:key:z6Mn..."],
"maxParticipantsPerNode": 8,
"fallback": "mesh",
"maxMeshParticipants": 4
}
}New fields:
sfuPeers: [DID]— list of DIDs offering SFU capability (replaces singledesignatedPeerwhen in cascaded mode)maxParticipantsPerNode— capacity limit per SFU node before overflow to another nodemode: "cascaded"— activates multi-node SFU
Pipe Transports (Inter-SFU Relay)
Each pair of SFU nodes in the cluster establishes a str0m peer connection ("pipe transport"):
- Only tracks that a remote node's participants have subscribed to are forwarded over the pipe
- A track is never forwarded back to the SFU node it originated from
- Pipe transports are established on-demand when the first cross-node subscription occurs
Proposal by @HexaField. Issue created by @data-bot-coasys.