feat: Embedded SFU for scalable WebRTC conferencing in neighbourhoods

## Problem

ADAM's WebRTC conferencing uses a full mesh topology — each participant maintains a direct connection to every other participant. Connection count grows quadratically:

| Participants | Connections per peer | Total connections |
|---|---|---|
| 2 | 1 | 2 |
| 4 | 3 | 12 |
| 6 | 5 | 30 |
| 8 | 7 | 56 |

At 6-8 participants, each peer must encode and upload their media stream N-1 times. Bandwidth and CPU requirements hit a hard ceiling. This was observed in practice during a recent 8-person call.

## Proposed Solution: Embedded SFU (Selective Forwarding Unit)

An SFU receives each participant's media stream **once**, then selectively forwards it to all other participants. Each peer only uploads once regardless of group size. This is the standard architecture used by Jitsi, LiveKit, mediasoup, etc.

### Key Design: SFU in the ADAM Executor

Rather than requiring external infrastructure, embed SFU capability directly in the ADAM executor:

- Every ADAM agent gains SFU capability by default
- No external service dependency
- Shares the executor's async runtime, identity system, and neighbourhood membership
- The executor already manages networking and auth — the SFU piggybacks on this

**Recommended library: [`str0m`](https://github.com/algesten/str0m)** — a Sans I/O Rust WebRTC implementation with an explicit [SFU example](https://github.com/algesten/str0m/blob/main/examples/chat.rs). Sans I/O fits well with the executor's architecture — no internal threads or hidden async tasks, all operations driven by the caller. `webrtc-rs` is an alternative but is heavier.

### Designated SFU Peer per Neighbourhood

Not every peer in a call acts as the SFU. The neighbourhood designates one peer:

1. **Cloud Gateway** — for gateway-connected neighbourhoods, the gateway acts as default SFU (always-on, server-grade bandwidth)
2. **Designated peer** — neighbourhood admin sets `sfu_peer` DID in Social DNA. Simplest self-sovereign model.
3. **Mesh fallback** — if SFU unavailable or ≤4 participants, fall back to direct mesh (current behaviour, zero regression)

### Call Flow

```
Initiator creates call link in neighbourhood
  → Peers query neighbourhood for SFU peer
    → If SFU available: each peer connects once to SFU
    → SFU forwards streams to all peers
    → If SFU unavailable + ≤4 peers: mesh fallback
    → If SFU unavailable + >4 peers: warn, attempt mesh
```

Signalling (SDP offer/answer, ICE candidates) routes through existing neighbourhood communication — no new signalling infrastructure needed.

## Executor Changes

### New Module: `sfu/`

```
rust-executor/src/
├── sfu/
│   ├── mod.rs          // Module entry, SFU lifecycle
│   ├── server.rs       // str0m-based WebRTC SFU server
│   ├── room.rs         // Room/session management per neighbourhood
│   └── relay.rs        // Media relay & selective forwarding logic
```

### GraphQL API

```graphql
type Mutation {
  sfuStartRoom(neighbourhoodUrl: String!, roomId: String!): SfuRoom!
  sfuStopRoom(roomId: String!): Boolean!
  callJoin(neighbourhoodUrl: String!, roomId: String!): CallSession!
  callLeave(roomId: String!): Boolean!
}

type Query {
  sfuRooms: [SfuRoom!]!
  sfuPeerForNeighbourhood(neighbourhoodUrl: String!): String
}

type Subscription {
  callParticipants(roomId: String!): CallParticipantEvent!
  callStreams(roomId: String!): CallStreamEvent!
}
```

### Social DNA Extension

```json
{
  "sfu": {
    "mode": "designated",
    "designatedPeer": "did:key:z6Mk...",
    "fallback": "mesh",
    "maxMeshParticipants": 4
  }
}
```

Modes: `"gateway"` | `"designated"` | `"mesh"` (default, current behaviour)

## Flux UI Changes

- SFU mode indicator during calls
- Participant grid scaling for >8 participants (speaker view, active speaker detection)
- Quality selector (auto/high/medium/low) — SFU enables server-side bandwidth adaptation
- Neighbourhood settings panel for SFU peer configuration
- Simulcast support: client sends 3 layers (720p/360p/180p), SFU selects per recipient

## Implementation Phases

### Phase 1: Core SFU in Executor
- Embed `str0m` SFU in the executor
- GraphQL API for room management
- Social DNA `sfu` configuration
- Designated peer mode only

### Phase 2: Flux Integration & Cascaded SFU
- Call UI updates for SFU mode
- Settings panel for SFU configuration
- Simulcast support
- Fallback logic (SFU → mesh)
- Cascaded SFU mode (multi-node cluster, pipe transports)
- Capability-based SFU peer election (peers advertise bandwidth/uptime/CPU)

### Phase 3: Advanced
- Cross-cluster SFU (SFU nodes spanning multiple neighbourhoods)
- Recording (SFU has all streams — trivial to record)
- Breakout rooms (multiple SFU rooms per neighbourhood)
- Screen sharing optimisation
- E2E encryption via Insertable Streams / SFrame
- WE module extraction

## Security Considerations

- **Trust model:** SFU peer sees media in cleartext. For cloud gateway this is accepted. For designated peers, neighbourhood members implicitly trust that peer.
- **E2E encryption:** possible via Insertable Streams/SFrame (Phase 3) — SFU forwards encrypted frames it cannot decrypt.
- **Authentication:** peers authenticate to SFU via ADAM agent DID; SFU verifies neighbourhood membership before admitting.
- **Abuse prevention:** SFU operator can set limits (max participants, max bitrate) via Social DNA.

## Open Questions

1. **NAT traversal** — ✅ **Resolved**: Use ADAM/Flux's existing centralised TURN & STUN servers for now. The SFU peer will use the same ICE infrastructure that mesh calls already use. Decentralised TURN alternatives can be explored later but are not blocking.
2. **Resource compensation** — ✅ **Out of scope**: HoloFuel compensation for SFU nodes is deferred. Relates to the broader x402/mutual credit work but is not required for initial implementation.
3. **Live migration** — ✅ **Out of scope**: Tracked separately in #708. Initial implementation uses manual SFU peer designation (defaulting to gateway peer). Participants re-join if topology changes. Seamless migration is a future enhancement.
4. **`str0m` readiness** — Sans I/O is elegant but less battle-tested than Go SFUs (Pion/LiveKit). Needs evaluation under real load. The existing [chat example](https://github.com/algesten/str0m/blob/main/examples/chat.rs) demonstrates the pattern works.
5. **SFU placement: executor vs link language** — Where should the SFU live architecturally?

   **Option A: In the executor** (as currently proposed)
   - ✅ Access to the full async runtime, identity system, neighbourhood membership — no bridging needed
   - ✅ Simpler implementation — SFU is a Rust module alongside existing executor services
   - ✅ GraphQL API integrates naturally with the existing schema
   - ✅ Works for all link languages automatically — any neighbourhood gets SFU capability
   - ❌ Couples media infrastructure to the executor — every executor ships SFU code whether it's needed or not
   - ❌ Harder to swap SFU implementations per neighbourhood (e.g. one NH wants recording, another wants minimal)

   **Option B: In the link language, as a telepresence API extension**
   - ✅ Follows ADAM's abstraction model — media handling is already a language concern (the telepresence API is part of the language interface)
   - ✅ Different neighbourhoods can use different SFU implementations via different link languages
   - ✅ Keeps the executor lean — SFU capability is opt-in per language
   - ✅ Language-level SFU could be implemented in WASM (ties into #692), enabling sandboxed media processing
   - ❌ Link languages run in JS/WASM isolates — embedding `str0m` (Rust) requires either FFI bridging or a separate WASM-compiled SFU
   - ❌ The language would need to open server sockets and manage WebRTC connections — currently languages don't have this level of network access
   - ❌ More complex signalling path — SFU in the language needs to communicate back to the executor for ICE/TURN credentials and peer authentication
   - ❌ Per-language implementation burden — every link language that wants SFU must implement it

   **Option C: Hybrid — executor provides SFU primitives, language controls policy**
   - The executor embeds `str0m` and exposes SFU room management as a runtime service (like Holochain or SurrealDB)
   - The link language's telepresence API gains new methods: `requestSfu()`, `setSfuPeer()`, `getSfuConfig()`
   - The language decides *when* and *how* to use the SFU (threshold, peer selection policy), but the executor does the heavy lifting
   - This mirrors how languages use Holochain — they don't embed a conductor, they call the executor's Holochain service

   **Recommendation**: Option C (hybrid). The SFU is infrastructure (like Holochain), not application logic. It belongs in the executor as a service. But the telepresence API in the language interface should be extended so languages can control SFU behaviour — when to activate, which peer, quality settings. This keeps the abstraction clean while avoiding the practical problems of running a Rust SFU inside a JS/WASM isolate.

## Cascaded SFU (Multi-Node)

In addition to the single-SFU modes (gateway/designated/mesh), a **cascaded** mode enables multiple executor nodes to cooperate as an SFU cluster within a neighbourhood call.

### Mode: `"cascaded"`

A new topology mode alongside `"gateway"`, `"designated"`, and `"mesh"`. In cascaded mode:

- **Multiple nodes advertise SFU capability** via neighbourhood signalling (`sfu-announce` messages)
- Each SFU node accepts a subset of participants (up to `maxParticipantsPerNode`)
- **Inter-SFU relay via pipe transports**: SFU nodes establish str0m peer connections between each other, selectively forwarding tracks that remote participants have subscribed to
- Participants connect to their nearest/least-loaded SFU node; the SFU cluster handles cross-node media routing transparently

### SFU Cluster Discovery

- Nodes that can serve as SFU broadcast `sfu-announce` messages through neighbourhood signalling
- Each announce includes: DID, current participant count, capacity hint
- `sfu-pipe-offer` / `sfu-pipe-answer` messages establish pipe transports between SFU nodes
- `sfu-leave` message when an SFU node departs the cluster

### Social DNA Extensions

```json
{
  "sfu": {
    "mode": "cascaded",
    "sfuPeers": ["did:key:z6Mk...", "did:key:z6Mn..."],
    "maxParticipantsPerNode": 8,
    "fallback": "mesh",
    "maxMeshParticipants": 4
  }
}
```

New fields:
- `sfuPeers: [DID]` — list of DIDs offering SFU capability (replaces single `designatedPeer` when in cascaded mode)
- `maxParticipantsPerNode` — capacity limit per SFU node before overflow to another node
- `mode: "cascaded"` — activates multi-node SFU

### Pipe Transports (Inter-SFU Relay)

Each pair of SFU nodes in the cluster establishes a str0m peer connection ("pipe transport"):
- Only tracks that a remote node's participants have subscribed to are forwarded over the pipe
- A track is never forwarded back to the SFU node it originated from
- Pipe transports are established on-demand when the first cross-node subscription occurs

---

*Proposal by @hexafield. Issue created by @data-bot-coasys.*



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Embedded SFU for scalable WebRTC conferencing in neighbourhoods #700

Problem

Proposed Solution: Embedded SFU (Selective Forwarding Unit)

Key Design: SFU in the ADAM Executor

Designated SFU Peer per Neighbourhood

Call Flow

Executor Changes

New Module: `sfu/`

GraphQL API

Social DNA Extension

Flux UI Changes

Implementation Phases

Phase 1: Core SFU in Executor

Phase 2: Flux Integration & Cascaded SFU

Phase 3: Advanced

Security Considerations

Open Questions

Cascaded SFU (Multi-Node)

Mode: `"cascaded"`

SFU Cluster Discovery

Social DNA Extensions

Pipe Transports (Inter-SFU Relay)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Embedded SFU for scalable WebRTC conferencing in neighbourhoods #700

Description

Problem

Proposed Solution: Embedded SFU (Selective Forwarding Unit)

Key Design: SFU in the ADAM Executor

Designated SFU Peer per Neighbourhood

Call Flow

Executor Changes

New Module: sfu/

GraphQL API

Social DNA Extension

Flux UI Changes

Implementation Phases

Phase 1: Core SFU in Executor

Phase 2: Flux Integration & Cascaded SFU

Phase 3: Advanced

Security Considerations

Open Questions

Cascaded SFU (Multi-Node)

Mode: "cascaded"

SFU Cluster Discovery

Social DNA Extensions

Pipe Transports (Inter-SFU Relay)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New Module: `sfu/`

Mode: `"cascaded"`