Files
openclaw/docs/architecture.md

90 lines
5.6 KiB
Markdown
Raw Normal View History

---
summary: "Target WebSocket gateway architecture, components, and client flows"
read_when:
- Working on gateway protocol, clients, or transports
---
2025-12-09 14:41:41 +01:00
# Gateway Architecture (target state)
Last updated: 2025-12-09
## Overview
2025-12-15 10:11:18 -06:00
- A single long-lived **Gateway** process owns all messaging surfaces (WhatsApp via Baileys, Telegram via grammY, Discord via discord.js) and the control/event plane.
2025-12-09 14:41:41 +01:00
- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over one transport: **WebSocket on 127.0.0.1:18789** (tunnel or VPN for remote).
- One Gateway per host; it is the only place that is allowed to open a WhatsApp session. All sends/agent runs go through it.
2026-01-04 14:32:47 +00:00
- By default: the Gateway exposes a Canvas host on `canvasHost.port` (default `18793`), serving `~/clawd/canvas` at `/__clawdbot__/canvas/` with live-reload; disable via `canvasHost.enabled=false` or `CLAWDBOT_SKIP_CANVAS_HOST=1`.
2025-12-09 14:41:41 +01:00
## Components and flows
- **Gateway (daemon)**
2025-12-15 10:11:18 -06:00
- Maintains Baileys/Telegram/Discord connections.
2025-12-09 14:41:41 +01:00
- Exposes a typed WS API (req/resp + server push events).
- Validates every inbound frame against JSON Schema; rejects anything before a mandatory `connect`.
2025-12-09 14:41:41 +01:00
- **Clients (mac app / CLI / web admin)**
- One WS connection per client.
- Send requests (`health`, `status`, `send`, `agent`, `system-presence`, toggles) and subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
2026-01-04 14:32:47 +00:00
- On macOS, the app can also be invoked via deep links (`clawdbot://agent?...`) which translate into the same Gateway `agent` request path (see `docs/clawdbot-mac.md`).
2025-12-13 13:25:49 +00:00
- **Agent process (Pi)**
2025-12-09 14:41:41 +01:00
- Spawned by the Gateway on demand for `agent` calls; streams events back over the same WS connection.
- **WebChat**
- Serves static assets locally.
- Holds a single WS connection to the Gateway for control/data; all sends/agent runs go through the Gateway WS.
- Remote use goes through the same SSH/Tailscale tunnel as other clients.
## Connection lifecycle (single client)
```
Client Gateway
| |
|---- req:connect -------->|
|<------ res (ok) ---------| (or res error + close)
| (payload=hello-ok carries snapshot: presence + health)
2025-12-09 14:41:41 +01:00
| |
|<------ event:presence ---| (deltas)
|<------ event:tick -------| (keepalive/no-op)
| |
|------- req:agent ------->|
|<------ res:agent --------| (ack: {runId,status:"accepted"})
|<------ event:agent ------| (streaming)
|<------ res:agent --------| (final: {runId,status,summary})
| |
```
## Wire protocol (summary)
- Transport: WebSocket, text frames with JSON payloads.
- First frame must be `req {type:"req", id, method:"connect", params:{minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? } }`.
- Server replies `res {type:"res", id, ok:true, payload: hello-ok }` or `ok:false` then closes.
2025-12-09 14:41:41 +01:00
- After handshake:
- Requests: `{type:"req", id, method, params}``{type:"res", id, ok, payload|error}`
- Events: `{type:"event", event:"agent"|"presence"|"tick"|"shutdown", payload, seq?, stateVersion?}`
2026-01-04 14:32:47 +00:00
- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token` must match; otherwise the socket closes with policy violation.
2025-12-09 14:41:41 +01:00
- Presence payload is structured, not free text: `{host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }`.
- Agent runs are acked `{runId,status:"accepted"}` then complete with a final res `{runId,status,summary}`; streamed output arrives as `event:"agent"`.
- Protocol versions are bumped on breaking changes; clients must match `minClient`; Gateway chooses within clients min/max.
- Idempotency keys are required for side-effecting methods (`send`, `agent`) to safely retry; server keeps a short-lived dedupe cache.
- Policy in `hello-ok` communicates payload/queue limits and tick interval.
## Type system and codegen
- Source of truth: TypeBox (or ArkType) definitions in `protocol/` on the server.
- Build step emits JSON Schema.
- Clients:
- TypeScript: uses the same TypeBox types directly.
- Swift: generated `Codable` models via quicktype from the JSON Schema.
- Validation: AJV on the server for every inbound frame; optional client-side validation for defensive programming.
## Invariants
- Exactly one Gateway controls a single Baileys session per host. No fallbacks to ad-hoc direct Baileys sends.
- Handshake is mandatory; any non-JSON or non-connect first frame is a hard close.
2025-12-09 14:41:41 +01:00
- All methods and events are versioned; new fields are additive; breaking changes increment `protocol`.
- No event replay: on seq gaps, clients must refresh (`health` + `system-presence`) and continue; presence is bounded via TTL/max entries.
## Remote access
- Preferred: Tailscale or VPN; alternate: SSH tunnel `ssh -N -L 18789:127.0.0.1:18789 user@host`.
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
2025-12-09 14:41:41 +01:00
## Operations snapshot
2026-01-04 14:32:47 +00:00
- Start: `clawdbot gateway` (foreground, logs to stdout).
2025-12-09 14:41:41 +01:00
Supervise with launchd/systemd for restarts.
- Health: request `health` over WS; also surfaced in `hello-ok.health`.
- Metrics/logging: keep outside this spec; gateway should expose Prometheus text or structured logs separately.
## Migration notes
- This architecture supersedes the legacy stdin RPC and the ad-hoc TCP control port. New clients should speak only the WS protocol. Legacy compatibility is intentionally dropped.