Skip to content

Multiplex Wire Protocol

The fabric's transport contract: one long-lived WSS connection per paired device into the cloud at /ws/device, NDJSON envelopes both directions, routed by sessionId + corrId. Charter §7 / §9. Implementation in src/fabric/envelope.ts (encoder/decoder + KNOWN_TYPES set), src/server/ws-multiplex.ts (cloud-side router), src/server/device-router.ts (capability scheduler + tmux.list bridge), src/daemon/transport.ts (device-side WS client + heartbeat + replay buffer), src/daemon/envelope.ts (daemon-side decoder + NdjsonBuffer).

Everything routable on the fabric — session lifecycle, tmux multiplex, MOTHER queries, vault unwraps, telemetry — rides this wire.

Envelope structure

The wire frame is one JSON object per WS message, newline-terminated (encodeEnvelope appends \n; NdjsonBuffer splits incoming chunks on \n for line-buffered tolerance):

ts
interface Envelope<P = unknown> {
  type: EnvelopeType;     // discriminator; rejected as `unknown-type` if not in KNOWN_TYPES
  sessionId: SessionId;   // non-empty string; uuid for harness sessions, "device-control" for telemetry
  corrId: CorrId;         // non-empty string; cloud matches responses by this
  ts: number;             // unix ms, device clock (cloud reconciles)
  payload: P;             // type-specific shape; `null` allowed
}

decodeEnvelope in src/fabric/envelope.ts rejects any frame missing type/sessionId/corrId/ts, or with a type not in the locked KNOWN_TYPES set. The cloud bounces a structured error envelope back rather than dropping the connection — see WSMultiplex.routeFromDevice (src/server/ws-multiplex.ts:411).

Envelope catalog (KNOWN_TYPES, src/fabric/envelope.ts:16)

TypeDirectionPurpose
session.startcloud → deviceSpawn a harness session with the resolved scope/cred env
session.stdout / session.stderrdevice → cloudOne line per envelope, streamed live
session.exitdevice → cloudTerminal frame; clears pending entry on cloud side
session.cancelcloud → deviceCancel an in-flight session
session.rejecteddevice → cloudDaemon refused (capability mismatch, scope, etc.)
session.cancelleddevice → cloudACK of session.cancel
session.status / session.status.replycloud → device → cloudProbe
session.stdincloud → deviceForward keystrokes (rare; one-shot harnesses don't use it)
ping / pongeitherReserved app-level keepalive (transport currently uses heartbeat)
vault.unwrap.req / vault.unwrap.rescloud → device → cloudPhase 11+ device-local vault decrypt
tmux.list / tmux.list.replycloud → device → cloudItem D — list tmux sessions on a paired device
tmux.attach.start / .data / .end / .resizebothItem D — live PTY relay over fabric multiplex (8da8e3a)
heartbeatdevice → cloudCapability + telemetry digest every 20s
ackcloud → devicePer-envelope receipt (drops replay buffer entry on device)
erroreitherStructured {code, message} payload

Any type outside this set fails decode and the peer gets an error envelope back with code: "malformed-envelope".

Multiplex over WSS

Cloud-side WSMultiplex (src/server/ws-multiplex.ts) maintains two connection maps:

  • One device per deviceId. A second connection presenting a JWT for an already-mapped device evicts the old one with an error envelope (corrId: "device-replaced") and safeClose(4040). The pending-request map is cancelled for that peer (cancelPendingForPeer). See acceptDevice at ws-multiplex.ts:219.
  • Many concurrent browsers. Each browser tab gets a stable browserConnId (e.g. b-1-x7f8q2). Browsers carry no routing identity beyond that.

JWT is presented via the WS subprotocol header (Sec-WebSocket-Protocol: fabric-jwt-<token>) — the FABRIC_JWT_PREFIX convention survives Cloudflare-Tunnel header rewriting unchanged (extractJwtFromProtocol). At handshake, acceptDevice cross-checks vault_recipients.revoked_at via isDevicePubkeyActive (T4 closeout) — if the recipient was soft-revoked while the device was disconnected, the new connection is refused with code 4004.

Routing is corrId-keyed. When a browser asks the cloud to forward an envelope to a device, the cloud stores a PendingRequest ({corrId, fromKind, fromId, toKind, toId, expiresAt}) keyed by corrId. On a device → cloud reply, routeFromDevice looks up that corrId, finds the originating browser, and forwards. Terminal frames (session.exit, error) clear the entry. TTL is 30 minutes (PENDING_TTL_MS); stale entries get GC'd in tickHeartbeats.

Programmatic dispatch from HTTP routes (e.g. POST /api/sessions) goes through dispatchToDevice (ws-multiplex.ts:511) with a responseSink: (env) => void callback — bypasses the browser-connection table and synthesizes a sink-<corrId> browser entry so the response routing logic is uniform.

Heartbeats and capabilities

The device's FabricTransport (src/daemon/transport.ts) sends a heartbeat envelope every 20s with payload: { capabilities, installedHarnesses, ramFreeMb, concurrentSessionCount }. Cloud's absorbHeartbeat (ws-multiplex.ts:584) updates the DeviceConnection row in-place and ACKs back. Stale-after threshold is HEARTBEAT_STALE_MS = 60_000; the tickHeartbeats interval evicts dead devices every 5s.

The critical detail (commit 6fccb61, fix(fabric/multiplex): pair-time capabilities are durable, heartbeat MERGES not replaces):

ts
// DeviceConnection in ws-multiplex.ts:56
pairCapabilities: string[];    // from devices.capabilities at pair time — authoritative
runtimeCapabilities: string[]; // from heartbeat payload — additive
capabilities: string[];        // UNION, recomputed every heartbeat

absorbHeartbeat recomputes capabilities = new Set([...pairCapabilities, ...runtimeCapabilities]). Pre-6fccb61 the heartbeat REPLACED the capability set outright, which wiped operator-supplied tags (e.g. mother-llm) ~20s after every connect — MOTHER routing broke because the pair-time tag from optimal pair --capabilities evaporated. Now the operator's intent is durable.

Connection table fields surfaced for the scheduler:

FieldSourceUsed by
pairCapabilitiesdevices.capabilities at pair-time (auth-store.listDevices())merged into capabilities
runtimeCapabilitiesheartbeat payload.capabilitiesmerged into capabilities
installedHarnessesheartbeat payload.installedHarnessesscheduler hard filter
ramFreeMbheartbeatscheduler soft score
concurrentSessionCountheartbeatscheduler soft score
lastSeenAtbumped on any frame receivedscheduler soft score, eviction

tmux.list — Item D

tmux.list is request/response over the multiplex. Sent from /api/system/tmux?device=<id> (cockpit's [+] tmux dropdown), the cloud's requestTmuxList helper (src/server/device-router.ts:278) wraps dispatchToDevice in a 5-second timeout promise:

  1. Build envelope {type:"tmux.list", sessionId:"device-control", corrId:randomUUID(), payload:{}, ts:Date.now()}.
  2. Call multiplex.dispatchToDevice(...) with a response sink that resolves on the first tmux.list.reply carrying a sessions array.
  3. On the device, handleTmuxList (src/daemon/handlers/tmux-list.ts) execFiles tmux list-sessions -F '#{session_name}\t#{session_attached}\t#{session_windows}\t#{session_panes}', parses rows, sends back tmux.list.reply.
  4. Cloud normalizes the payload, returns a TmuxListResult discriminated union to the route layer. "No tmux server" / "no sessions" / ENOENT all become {ok: true, sessions: []} — the cockpit doesn't care.

tmux.attach.* — live PTY relay

Item D follow-up landed 2026-05-09 (8da8e3a, feat(fabric/tmux-attach): live PTY relay over fabric multiplex). The attach is a continuous stream keyed by corrId:

FrameDirectionPayload
tmux.attach.startcloud → device{session, cols, rows} — daemon spawns python3 pty-host.py tmux-attach <name> <cols> <rows>
tmux.attach.databoth{data: base64} — daemon ↔ pty stdin/stdout
tmux.attach.resizecloud → device{attachId, cols, rows} — xterm.js onResizepty-host control message \x00{"type":"resize",...}\n (Phase 4 cleanup, 2026-05-09)
tmux.attach.endeither{reason?} — daemon SIGTERMs the pty-host client, NOT tmux kill-server. The session and server both survive — detach semantics.

LiveAttaches registry on the daemon side (src/daemon/handlers/tmux-attach.ts) is intentionally separate from ActiveSessions — tmux attaches are PTY relays with a different lifecycle (no exit code we surface, no vault binding, no per-harness adapters). Sharing the registry would conflate concerns and break harness capacity accounting.

MOTHER bridge over the multiplex

MOTHER queries are just session.start envelopes with harness: "mother-llm". The cockpit mother-chat.ts panel POSTs to /api/sessions, the cloud's scheduler picks a device (see below), dispatchToDevice ships session.start, the daemon's mother-llm adapter spawns ollama run [--format json] <model> <prompt>, and stdout streams back as session.stdout envelopes corrId-keyed to the originating browser. STOP fires DELETE /api/sessions/:id which calls cancelDispatch → device gets a session.cancel envelope → daemon SIGTERMs the ollama child.

Per the mother-llm adapter (src/daemon/adapters/mother-llm.ts), MOTHER's cred-allowlist is empty — no provider keys are forwarded. Only HOME / PATH / OLLAMA_HOST / OLLAMA_MODELS / CUDA_VISIBLE_DEVICES inherit. The model picker defaults to qwen2.5-coder:7b.

Capability-aware routing (Phase 12-1, c125cfa)

pickDeviceForSpec in src/server/device-router.ts:149 is the scheduler. Pipeline per charter §9:

  1. Implied-capability injection. Some harnesses imply a required capability of the same name. HARNESS_REQUIRES_CAPABILITY in src/fabric/types.ts:42 maps mother-llmmother-llm. Cockpit doesn't need to pass capabilities: ["mother-llm"] explicitly; the router injects it.
  2. Explicit target short-circuit. If spec.deviceTarget !== "auto", the router skips scoring and returns that device — even if it doesn't match the harness/capabilities (Phase 11-3 detect-then-prompt installer needs that match object to surface "install harness X on Y" CTAs).
  3. Hard filter (deviceMatchesSpec, device-router.ts:102): session.harness ∈ device.installedHarnesses AND requiredCapabilities ⊆ device.capabilities. RAM is intentionally NOT in the hard filter today (no credible per-harness RAM estimator yet); it lives in the soft scorer instead.
  4. Soft score (scoreCandidates, device-router.ts:116): weighted sum of three signals, normalized [0,1] across the candidate set.
ts
WEIGHTS = {
  ramHeadroom:        0.50,   // deviceRam / max(allRam)
  inverseLoad:        0.30,   // 1 - (deviceLoad / max(allLoad))
  heartbeatFreshness: 0.20,   // (lastSeen - minSeen) / (maxSeen - minSeen)
}
  1. Tie-break: strict lexicographic deviceId sort, for test determinism.

The success branch returns {ok: true, match: {deviceId, reason: "auto-scored", conn, score, scoredCandidates}}. The scoredCandidates field is a descending-score audit trail with per-signal breakdowns — surfaced in the planet detail / sessions UI for debugging "why did the scheduler pick X."

The failure branch returns {ok: false, failure: {reason, available}} with reason ∈ {"no-online-devices", "no-capable-device", "unknown-device"} and an available snapshot of all online devices' installedHarnesses + capabilities — gives the cockpit enough info to render "Install Kimi on pop-os" CTAs without a second round-trip.

Source

  • Cloud envelope encoder/decoder + KNOWN_TYPES: src/fabric/envelope.ts
  • Shared types: src/fabric/types.ts
  • Cloud multiplex: src/server/ws-multiplex.ts
  • Scheduler + tmux.list bridge: src/server/device-router.ts
  • Daemon transport (reconnect, heartbeat, replay): src/daemon/transport.ts
  • Daemon envelope serde + NdjsonBuffer: src/daemon/envelope.ts
  • Daemon handlers: src/daemon/handlers/{tmux-list,tmux-attach,session-start,session-cancel,session-stdin,session-status,ping}.ts
  • Charter §7 (transport), §9 (capability routing): ~/.openclaw/workspace/optimalOS/docs/superpowers/specs/2026-05-03-fabric-charter.md

Built by Carlos Lenis in Miami