Skip to content

Fabric Runbook

Operational checks and recovery paths. Synthesizes ~/.optimalos/transfers/fabric-hetzner-handoff-2026-05-04.md §3 and §6.

Sanity checks on session start

Copy-paste each. All should return green. Mode-aware: this stack now runs as a single bundle deployed to two origins, gated by hostname (client) and DEPLOYMENT env (server). Both modes need probing.

bash
# 1. Both origins alive
curl -s -o /dev/null -w "optimal.miami/healthz HTTP %{http_code}\n" https://optimal.miami/healthz
curl -s -o /dev/null -w "fabric/healthz HTTP %{http_code}  total %{time_total}s\n" \
  https://fabric.optimal.miami/healthz

# 2. Mode gates are intact
curl -s -o /dev/null -w "legacy vault/recipients HTTP %{http_code}  (expect 410)\n" \
  https://optimal.miami/api/vault/recipients
curl -s -o /dev/null -w "fabric vault/recipients HTTP %{http_code}  (expect 401)\n" \
  https://fabric.optimal.miami/api/vault/recipients
curl -s -o /dev/null -w "fabric/vault/setup HTTP %{http_code}  (expect 200 HTML)\n" \
  https://fabric.optimal.miami/vault/setup
curl -s -o /dev/null -w "fabric api/system/health HTTP %{http_code}  (expect 401)\n" \
  https://fabric.optimal.miami/api/system/health

# 3. Hetzner box is reachable + services up
ssh -i ~/.ssh/id_ed25519 root@178.156.203.234 \
  "systemctl is-active optimalos cloudflared caddy && uptime"

# 4. Pi is unaffected
systemctl is-active optimalos n8n strapi cloudflared

# 5. Branch + commit state
cd ~/.openclaw/workspace/optimalOS && git log --oneline origin/clenis..HEAD | wc -l
cd ~/.openclaw/workspace/optimal-cli && git log --oneline origin/clenis..HEAD | wc -l

Expected as of 2026-05-05 evening: 0 unpushed commits on either repo — the 40 optimalOS commits and 7 optimal-cli commits from today are all on origin/clenis (range b08f948..bf69427). If git log origin/clenis..HEAD | wc -l returns nonzero, you've made local changes since the push.

Anything other than 200/410/401 on the four mode-gate probes above is a misconfig — either DEPLOYMENT env is missing/wrong on Hetzner, or a route that should be auth-gated isn't. Investigate before doing anything else.

Deployment modes — which origin you're hitting

The same dist/ bundle now serves both origins. Mode is decided two ways:

  • Client: client/fabric-mode.ts:isFabricMode() checks window.location.hostname against FABRIC_HOSTNAMES. Pi (optimal.miami) → false; Hetzner (fabric.optimal.miami) → true. Fabric-only chrome (vault links, fuel meter, sessions tab) hides on Pi.
  • Server: src/server.ts reads process.env.DEPLOYMENT === "hetzner-cloud" into FABRIC_DEPLOYMENT. When false (Pi, no env set), /api/vault, /api/auth/setup-init, /api/auth/devices/*, and /api/fuel/* all return HTTP 410 GONE. When true (Hetzner), the routes mount normally and reply 401 to unauthenticated calls.

Fast triage (the two probes that matter):

bash
# Legacy mode (Pi) healthy if vault routes return 410
curl -s -o /dev/null -w "%{http_code}\n" https://optimal.miami/api/vault/recipients
# expect: 410

# Fabric mode (Hetzner) healthy if vault routes return 401
curl -s -o /dev/null -w "%{http_code}\n" https://fabric.optimal.miami/api/vault/recipients
# expect: 401

A 200 from either is a misconfig — something that should be auth-gated isn't. A 410 from fabric.optimal.miami means DEPLOYMENT env is missing on Hetzner; see the breakage section below.

Push the vault migrations (the one blocker)

Status as of 2026-05-04: APPLIED. This section is retained for re-runs or if a future migration drift forces a re-push. The procedure below worked once; idempotent migrations mean it's safe to re-run.

Path A — CLI

bash
cd ~/.openclaw/workspace/optimal-cli/supabase
echo "Y" | supabase migration repair --status reverted 20260311
cd ~/.openclaw/workspace/optimal-cli
optimal infra migrate push --target optimalos

If this returns failed SASL auth or context deadline exceeded, the Supabase pooler is flaking. Wait 10–30 min and retry, or switch to Path B.

Path B — Dashboard paste

  1. Open supabase.com/dashboard/project/hbfalrpswysryltysonm/sql/new
  2. Paste and run each file in timestamp order:
    bash
    cat ~/.optimalos/transfers/20260503215150_fabric_vault_phase_10a_2.sql
    cat ~/.optimalos/transfers/20260503221013_fabric_devices_phase_10b_3.sql
    cat ~/.optimalos/transfers/20260503235721_fabric_vault_canary_phase_10a_7.sql
  3. Record applied versions so the Supabase CLI doesn't try to re-apply them later:
    sql
    INSERT INTO supabase_migrations.schema_migrations (version, name, statements)
    VALUES
      ('20260503215150', 'fabric_vault_phase_10a_2', '{}'),
      ('20260503221013', 'fabric_devices_phase_10b_3', '{}'),
      ('20260503235721', 'fabric_vault_canary_phase_10a_7', '{}');

Verify migrations landed

bash
URL=https://hbfalrpswysryltysonm.supabase.co
KEY=$(grep ^OPTIMAL_SUPABASE_SERVICE_KEY ~/.openclaw/workspace/optimal-cli/.env | cut -d= -f2-)
for t in vault_entries vault_recipients vault_access_log devices pairing_tokens; do
  code=$(curl -s -o /dev/null -w "%{http_code}" \
    -H "apikey: $KEY" -H "Authorization: Bearer $KEY" \
    "$URL/rest/v1/$t?limit=0&select=*")
  echo "$t  HTTP $code"
done

All five should return 200. After that, run the vault ceremony at https://fabric.optimal.miami/vault/setup using INVITE_PASSWORD from ~/.optimalos/transfers/.fabric-invite-password.

Deploying a fresh bundle (proven 3x today)

Compressed sequence — the canonical 5 steps that ran clean three times during the 2026-05-05 session. Use this when you just need to ship a fix; use the longer "Deploy fresh build to Hetzner" section below when you need bundle-hash verification + rollback artifacts.

bash
# 1. Build cloud bundle from optimalOS workspace on the Pi
cd ~/.openclaw/workspace/optimalOS && pnpm build:cloud

# 2. Snapshot existing app dir (rollback target)
ssh root@178.156.203.234 "cp -a /opt/optimalos/app /opt/optimalos/app.bak.$(date +%Y%m%d-%H%M%S)"

# 3. Push new bundle (mirrors dist/, deletes stale)
rsync -avz --delete -e "ssh -i ~/.ssh/id_ed25519" \
  ~/.openclaw/workspace/optimalOS/dist/ \
  root@178.156.203.234:/opt/optimalos/app/

# 4. Fix perms + restart
ssh root@178.156.203.234 "chown -R optimal:optimal /opt/optimalos/app && \
  find /opt/optimalos/app -type f -exec chmod 644 {} \; && \
  find /opt/optimalos/app -type d -exec chmod 755 {} \; && \
  systemctl restart optimalos"

# 5. Verify (first hit ~5s warmup, subsequent <500ms)
curl -s -o /dev/null -w "%{http_code}\n" https://fabric.optimal.miami/healthz   # expect 200

Pi-side note: the Pi's optimalos.service ALSO runs from ~/.openclaw/workspace/optimalOS. After every Pi-side pnpm build, run sudo systemctl restart optimalos so optimal.miami picks up the new bundle. Both deployments share the same source tree but render conditionally on hostname (client) and DEPLOYMENT env (server).

Deploy fresh build to Hetzner

Proven end-to-end on 2026-05-04 evening. Sequence:

bash
# 1. Build cloud-mode bundle from optimalOS workspace
cd ~/.openclaw/workspace/optimalOS
pnpm build:cloud
# produces dist/server.js + dist/client/* (verify e.g. dist/client/assets/index-*.js)

# 2. Take a backup on Hetzner so a bad deploy can roll back fast
TS=$(date +%Y%m%d-%H%M%S)
ssh -i ~/.ssh/id_ed25519 root@178.156.203.234 \
  "cp -a /opt/optimalos/app /opt/optimalos/app.bak.$TS"

# 3. Sync the new bundle (mirrors dist/ into the live tree, deleting stale files)
rsync -avz --delete \
  -e "ssh -i ~/.ssh/id_ed25519" \
  ~/.openclaw/workspace/optimalOS/dist/ \
  root@178.156.203.234:/opt/optimalos/app/

# 4. Fix ownership + perms (rsync as root preserves source perms)
ssh -i ~/.ssh/id_ed25519 root@178.156.203.234 "
  chown -R optimal:optimal /opt/optimalos/app && \
  find /opt/optimalos/app -type f -exec chmod 644 {} \; && \
  find /opt/optimalos/app -type d -exec chmod 755 {} \;"

# 5. Restart and verify
ssh -i ~/.ssh/id_ed25519 root@178.156.203.234 systemctl restart optimalos
curl -s -o /dev/null -w "healthz HTTP %{http_code}\n" https://fabric.optimal.miami/healthz
curl -s -o /dev/null -w "vault/setup HTTP %{http_code}\n" https://fabric.optimal.miami/vault/setup
curl -s https://fabric.optimal.miami/api/version | head -c 200; echo

# 6. Confirm bundle hash served matches what you built (sanity)
curl -s https://fabric.optimal.miami/vault/setup | grep -oE 'index-[A-Za-z0-9_-]+\.js' | sort -u
ls ~/.openclaw/workspace/optimalOS/dist/client/assets/index-*.js

If anything is wrong, roll back: ssh root@178.156.203.234 "rm -rf /opt/optimalos/app && mv /opt/optimalos/app.bak.$TS /opt/optimalos/app && systemctl restart optimalos".

Hetzner SSH

bash
ssh -i ~/.ssh/id_ed25519 root@178.156.203.234

Key auth only. No password. UFW currently allows public 22; locking down to Tailscale-only is queued for Phase 15 prep.

Once on the box:

NeedCommand
Recent OptimalOS logsjournalctl -u optimalos --since "10 min ago" --no-pager
Restart server (zero-downtime not yet wired)systemctl restart optimalos
Cloudflared statussystemctl status cloudflared --no-pager -l
Tunnel re-registersystemctl restart cloudflared
Disk + memorydf -h /; free -h
Listening portsss -ltnp
Active sessionsjournalctl -u optimalos --since "1 hour ago" | grep session.start

Common breakages

"Failed to fetch" on vault setup form submit

The migrations didn't land (or partially landed). Re-run the verify block above; any 404 means re-push.

JWT_SIGNING_KEY empty after redeploy

Symptom: POST /api/vault/setup-init returns 500 with Internal Server Error (devtools network tab → request payload looks fine, response is JSON {"error":"..."} or empty body). Hetzner journal shows Error: JWT_SIGNING_KEY is empty.

bash
# On Hetzner
grep JWT_SIGNING_KEY /opt/optimalos/secrets.env
# If blank or missing:
NEW_KEY=$(openssl rand -base64 32)
sed -i "s|^JWT_SIGNING_KEY=.*|JWT_SIGNING_KEY=$NEW_KEY|" /opt/optimalos/secrets.env
systemctl restart optimalos
# Verify: a wrong-password setup-init now returns 401 not 500
curl -s -o /dev/null -w "%{http_code}\n" -X POST \
  -H "content-type: application/json" \
  -d '{"invitePassword":"wrong"}' \
  https://fabric.optimal.miami/api/vault/setup-init
# expect: 401

Stuck at /vault/setup step 4/4 with "no session token; sign in first"

Fixed 2026-05-05. Root cause: /vault/setup was auth-gated server-side but the page rendered the wizard even without a session, so users could walk steps 1–3 then hit step 4 with no JWT. Fix shipped: renderSetup() now checks for a session token first and redirects to /?return=... when missing; signing in at root then bounces back through. If you see this pattern again post-fix, the redirect is broken — check client/vault/setup.ts and confirm the build that's served includes the new guard.

If a user reports the legacy symptom on an old client (e.g. cached SW), tell them to sign in at https://fabric.optimal.miami/ first, then navigate to /vault/setup. Devtools network tab tells you which fix path: POST /api/vault/setup-finalize with 401 is the auth-gate (now redirected client-side); 500 is JWT_SIGNING_KEY empty (see below).

Stranded on /vault/dashboard with no path back to OptimalOS

Fixed 2026-05-05. Root cause: vault flow design — once the ceremony finished, the user landed on /vault/dashboard with no nav back to the cockpit. Fixes shipped: post-ceremony redirect now goes to / (was /vault/dashboard); a Back-to-OptimalOS button is rendered in the vault dashboard header. If a user is still stranded, they're on a stale client; trigger a hard reload (Cmd+Shift+R).

Fuel meter overlapping the [XFER] button

Fixed 2026-05-05 (commit f81b412). Phase 14-4 placed the fuel meter at top: 8px which collided with the [XFER] header link. Now positioned at top: 44px to clear the header chrome. If you see overlap again, search the bundle for top:8px on the fuel-meter selector — a CSS regression has crept back in.

Vault routes return 410 on fabric.optimal.miami

Symptom: /vault/setup, /vault/unlock, or any /api/vault/* call from fabric.optimal.miami responds 410 GONE. The vault flow won't load; sessions tab is empty.

Root cause: DEPLOYMENT env var is not set (or not set to hetzner-cloud) in /opt/optimalos/secrets.env. The server is treating fabric mode as legacy.

bash
ssh root@178.156.203.234
grep DEPLOYMENT /opt/optimalos/secrets.env
# Should show: DEPLOYMENT=hetzner-cloud
# If missing or different:
echo "DEPLOYMENT=hetzner-cloud" >> /opt/optimalos/secrets.env
chown optimal:optimal /opt/optimalos/secrets.env
chmod 0600 /opt/optimalos/secrets.env
systemctl restart optimalos
# Verify:
curl -s -o /dev/null -w "%{http_code}\n" https://fabric.optimal.miami/api/vault/recipients
# expect: 401 (mode is fabric, route mounted, auth required)

The inverse misconfig — optimal.miami (Pi) returning 200 instead of 410 on /api/vault/recipients — means DEPLOYMENT=hetzner-cloud ended up in the Pi's environment somehow. Pi has no secrets.env; check whatever shell or systemd drop-in is leaking it.

Cloudflared tunnel down

Symptom: https://fabric.optimal.miami/healthz times out or returns 502 from Cloudflare.

bash
ssh -i ~/.ssh/id_ed25519 root@178.156.203.234 systemctl restart cloudflared
journalctl -u cloudflared --since "5 min ago" --no-pager

If the tunnel won't reconnect, check /etc/cloudflared/00633812-46c8-4e70-a001-0b151758c78b.json is intact (mode 0600) and the cert at /etc/cloudflared/cert.pem is present.

OptimalOS process crashed

bash
ssh root@178.156.203.234 systemctl status optimalos --no-pager -l
journalctl -u optimalos --since "10 min ago" --no-pager
systemctl restart optimalos

If it crash-loops, check /opt/optimalos/secrets.env for missing keys (12 keys expected). Most common cause: JWT_SIGNING_KEY deleted/empty.

Supabase pooler SASL flake

Symptom: supabase db push returns failed SASL auth even with correct credentials.

This is intermittent on Supabase's side. Two workarounds:

  1. Wait 10–30 minutes and retry.
  2. Use the dashboard paste path above; this never touches the pooler.

Local dev

The Hetzner box is the only deployment target. Local dev still works:

bash
cd ~/.openclaw/workspace/optimalOS
bun run dev   # boots on :3000
bun test tests/e2e/fabric-smoke.test.ts   # hermetic, ~30s

The smoke harness spins up an in-memory cloud + daemon subprocess + browser SSE client and proves the JWT/WS/spawn round-trip end-to-end.

Branch + push policy

Per /home/oracle/CLAUDE.md:

  • All work on clenis. main reserved for finalized perfect patches.
  • Push commits only after vault migrations are verified live and a manual ceremony walk passes.
  • Do not force-push. Do not skip hooks.

Escalation paths

  • Vault key compromise suspected: revoke all recipients via dashboard (UPDATE vault_recipients SET revoked_at = now()), force re-wrap by re-registering each recipient. T4 fix needed before this is fully effective on devices.
  • Hetzner box compromise: rotate JWT_SIGNING_KEY + INVITE_PASSWORD in /opt/optimalos/secrets.env, restart optimalos, force all browsers + devices to re-pair. Vault ciphertext stays safe (server can't decrypt).
  • Pi compromise: rotate device's vault keypair via dashboard revoke + re-pair. Vault ciphertext on Hetzner stays safe.

Source

  • Handoff doc: ~/.optimalos/transfers/fabric-hetzner-handoff-2026-05-04.md
  • Bootstrap zip (provisioning artifacts): ~/.optimalos/transfers/fabric-hetzner-bootstrap.zip
  • Charter §11 (operational decisions): ~/.openclaw/workspace/optimalOS/docs/superpowers/specs/2026-05-03-fabric-charter.md

Built by Carlos Lenis in Miami