Tailscale Mesh
The Pi (oracle), the desktop (pop-os), and the Hetzner CCX13 cockpit are connected via Tailscale. Most cross-node Loom dispatch, SSH-driven deploys, and cross-instance health checks ride the mesh. The one big exception is Hetzner-bound traffic, which currently egresses through a dedicated Cloudflare Tunnel — see Hetzner caveat below.
Why Tailscale
- NAT, auth, and encryption solved out of the box.
- MagicDNS gives stable hostnames (
oracle,pop-os) that don't change when ISPs hand out new IPs. - Bring-your-own-CIDR — no port forwarding on the home router.
- Already installed on every node we care about.
Mesh nodes
| Tailscale name | IP | Role |
|---|---|---|
oracle | 100.94.77.114 | The Pi. Legacy hub. Runs optimalos.service (legacy mode :3000), n8n.service, strapi.service, cloudflared.service, Phoenix Docker, and the docs site. |
pop-os | (assigned by tailnet) | Desktop primary worker. Runs OptimalOS as a remote worker for Loom dispatch, hosts MOTHER LLM via Ollama, has GPU capability tags (RTX 3060). |
hetzner-cockpit (fabric.optimal.miami) | 178.156.203.234 (public) | Fabric control plane on Hetzner CCX13. Joined the tailnet 2026-05-04 alongside Phase 10. Runs OptimalOS in Fabric mode (DEPLOYMENT=hetzner-cloud). |
The Tailscale hostnames are the values used for target_host in loom_job_queue. Workflow steps that pin to a node use the same names in defaultHost or hostHint.
Cross-node Loom dispatch
Cross-node dispatch shipped as Phase 9 on 2026-04-17 and was validated end-to-end via the cross-node-smoke strand:
- Trigger on any node. Any OptimalOS instance can enqueue a job with a
target_hostset to a sibling's Tailscale name. - Loom routes via the mesh. The scheduler on the target node polls
loom_job_queuefor rows pinned to its hostname (with the resolver shim handling MagicDNS hiccups). - Target executes, posts results back. The step result lands in
loom_step_runsagainst the originalrun_id, so the originating node sees the outcome regardless of who actually executed.
Today the pi-to-desktop path is the workhorse leg: scheduled strands on the Pi pin GPU-heavy or LLM steps to pop-os and consume the results inline. The popos-deploy strand on the Pi also drives deploys on the desktop over SSH (same Tailscale interface).
Hetzner caveat: Cloudflare Tunnel, not pure Tailscale
fabric.optimal.miami joined the tailnet on 2026-05-04, but the public ingress for the cockpit is still a dedicated Cloudflare Tunnel (UUID 00633812-46c8-4e70-a001-0b151758c78b). That means:
- The blanket claim "every cross-node Loom job goes through the mesh" is no longer strictly true. Browser traffic to
fabric.optimal.miamiand any cockpit-bound API call hits Cloudflare's edge first and tunnels down to OptimalOS:3000 on the Hetzner box. - Tailscale is still in the picture for internal device-fabric chatter (paired-device daemon polling, JWT exchange, vault access). Pairing minted by the cockpit happens over Tailscale once a device is on the tailnet.
- See Fabric Architecture for the full transport contract — when to expect Tailscale vs. when to expect CF Tunnel.
MagicDNS resolution shim
MagicDNS works fine for interactive use but occasionally fails inside Bun's native fetch when the system resolver is racing with the Tailscale daemon. The shim at optimalOS/src/routes/_tailscale.ts exposes a resolveNodeBaseUrl(name) helper that:
- Looks up the node in
openclaw_instances. - Pulls the
tailscale_hostnameand falls back to a direct100.x.x.xIPv4 if MagicDNS is being slow. - Returns a fully-qualified base URL the caller can pass to
fetch.
This is what cross-node health checks and Loom dispatch use under the hood.
SSH-driven deploys
The popos-deploy workflow (optimalOS/workflows/popos-deploy.ts) runs on the Pi but executes deploy commands on pop-os over SSH:
- Private key:
~/.ssh/id_popos-deployon the Pi. - Public key: in
~/.ssh/authorized_keysonpop-os. - SSH config entry:
Host pop-oswithUser carlos,IdentityFile ~/.ssh/id_popos-deploy.
The same key is reused by cross-node-smoke (optimalOS/workflows/cross-node-smoke.ts) for periodic round-trip health checks.
Hetzner deploys are separate: SSH to Hetzner uses ~/.ssh/id_ed25519 (key auth only, no passwords), and the canonical command is ssh -i ~/.ssh/id_ed25519 root@178.156.203.234. There's no Loom strand wrapping this yet — deploys are scripted by hand or by optimal deploy from a workstation.
SSH-to-Pi: broken, use the API
Direct SSH to the Pi from the desktop is currently broken (key mismatch + DNS hiccup, tracked in Carlos's memory under "Pi access"). Until that's fixed, the path is:
- Pi system endpoints exposed at
https://optimal.miami/api/system/*(passphrase auth, agent launch, system reads). - Tailscale IP
100.94.77.114and MagicDNS hostnameoracleare still documented above for the day SSH is repaired.
Adding a new compute node
- Install Tailscale on the new node and bring it up under the same tailnet (
tailscale up). - Verify reachability from the Pi:
tailscale ping <new-node>. - Install OptimalOS (
bun add -g optimalos, thenoptimalos initandoptimalos start). - Run
optimal infra heartbeat --name <new-node>from the new node so it registers inopenclaw_instanceswith capability tags populated. - (Optional) Pair the node as a Fabric device: mint a pairing token in the cockpit, then run
optimal pairon the new node. - Add a workflow
defaultHostor per-stephostHintto pin work there.
Hub lease
The Loom scheduler currently assumes the local instance is the hub. Until the Phase 9 hub-lease lands, only one compute node should run scheduled workflows; the others should set enabled: false on cron strands or be configured as workers only. Charter §7.
Troubleshooting
| Symptom | First check |
|---|---|
Cross-node Loom step stuck in pending | Is optimalos.service running on the target node? Tailscale up? loom_job_queue.target_host exact match? |
ECONNREFUSED to a node | The node's Bun port (3000) bound to localhost only — set OPTIMALOS_HOST=0.0.0.0 or use the Tailscale interface IP. |
| Slow first request after sleep | MagicDNS warm-up. The resolver shim retries once with the IPv4 fallback. |
| SSH deploy fails with "host key verification" | Run ssh-keyscan pop-os >> ~/.ssh/known_hosts on the Pi (or use the Settings SSH scan card). |
fabric.optimal.miami returns 530 / 502 | Cloudflare Tunnel for Hetzner is down. Check systemctl status cloudflared on Hetzner, not Tailscale. |
| Hetzner cockpit unreachable but tailnet looks fine | Same as above — Hetzner ingress is CF Tunnel, not Tailscale. |