Files
openclaw/docs/cron.md

375 lines
15 KiB
Markdown
Raw Normal View History

---
summary: "RFC: Cron jobs + wakeups for Clawd/Clawdis (main vs isolated sessions)"
read_when:
- Designing scheduled jobs, alarms, or wakeups
- Adding Gateway methods or CLI commands for automation
- Adjusting heartbeat behavior or session routing
---
# RFC: Cron jobs + wakeups for Clawd
Status: Draft
Last updated: 2025-12-13
## Context
Clawdis already has:
2025-12-26 02:35:21 +01:00
- A **gateway heartbeat runner** that runs the agent with `HEARTBEAT` and suppresses `HEARTBEAT_OK` (`src/infra/heartbeat-runner.ts`).
- A lightweight, in-memory **system event queue** (`enqueueSystemEvent`) that is injected into the next **main session** turn (`drainSystemEvents` in `src/auto-reply/reply.ts`).
- A WebSocket **Gateway** daemon that is intended to be always-on (`docs/gateway.md`).
This RFC adds a small “cron job system” so Clawd can schedule future work and reliably wake itself up:
- **Delayed**: run on the *next* normal heartbeat tick
- **Immediate**: run *now* (trigger a heartbeat immediately)
- **Isolated jobs**: optionally run in their own session that does not pollute the main session and can run concurrently (within configured limits).
## Goals
- Provide a **persistent job store** and an **in-process scheduler** owned by the Gateway.
- Allow each job to target either:
- `sessionTarget: "main"`: inject as `System:` lines and rely on the main heartbeat (or trigger it immediately).
- `sessionTarget: "isolated"`: run an agent turn in a dedicated session key (job session), optionally delivering a message and/or posting a summary back to main.
- Expose a stable control surface:
- **Gateway methods** (`cron.*`, `wake`) for programmatic usage (mac app, CLI, agents).
- **CLI commands** (`clawdis cron ...`) to add/remove/edit/list and to debug `run`.
- Produce clear, structured **logs** for job lifecycle and execution outcomes.
## Non-goals (v1)
- Multi-host distributed scheduling.
- Exactly-once semantics across crashes (we aim for “at-least-once with idempotency hooks”).
- A full Unix-cron parser as the only schedule format (we can support it, but v1 should not require complex cron features to be useful).
## Terminology
- **Wake**: a request to ensure the agent gets a turn soon (either right now or next heartbeat).
- **Main session**: the canonical session bucket (default key `"main"`) that receives `System:` events.
- **Isolated session**: a per-job session key (e.g. `cron:<jobId>`) with its own session id / session file.
## User stories
- “Remind me in 20 minutes” → add a one-shot job that triggers an immediate heartbeat at T+20m.
- “Every weekday at 7:30, wake me up and start music” → recurring job, isolated session, deliver to WhatsApp.
- “Every hour, check battery; only interrupt me if < 20%” isolated job that decides whether to deliver; may also post a brief status to main.
- “Next heartbeat, please check calendar” → delayed wake targeting main session.
## Job model
### Storage schema (v1)
Each job is a JSON object with stable keys (unknown keys ignored for forward compatibility):
- `id: string` (UUID)
2025-12-20 19:56:49 +00:00
- `name: string` (required)
- `description?: string` (optional)
- `enabled: boolean`
- `createdAtMs: number`
- `updatedAtMs: number`
- `schedule` (one of)
- `{"kind":"at","atMs":number}` (one-shot)
- `{"kind":"every","everyMs":number,"anchorMs"?:number}` (simple interval)
- `{"kind":"cron","expr":string,"tz"?:string}` (optional; see “Schedule parsing”)
- `sessionTarget: "main" | "isolated"`
- `wakeMode: "next-heartbeat" | "now"`
- For `sessionTarget:"isolated"`, `wakeMode:"now"` means “run immediately when due”.
- For `sessionTarget:"main"`, `wakeMode` controls whether we trigger the heartbeat immediately or just enqueue and wait.
- `payload` (one of)
- `{"kind":"systemEvent","text":string}` (enqueue as `System:`)
- `{"kind":"agentTurn","message":string,"deliver"?:boolean,"channel"?: "last"|"whatsapp"|"telegram","to"?:string,"timeoutSeconds"?:number}`
- `isolation` (optional; only meaningful for isolated jobs)
2025-12-13 12:09:20 +00:00
- `{"postToMainPrefix"?: string}`
- `runtime` (optional)
- `{"maxAttempts"?:number,"retryBackoffMs"?:number}` (best-effort retries; defaults off)
- `state` (runtime-maintained)
- `{"nextRunAtMs":number,"lastRunAtMs"?:number,"lastStatus"?: "ok"|"error"|"skipped","lastError"?:string,"lastDurationMs"?:number}`
### Key behavior
- `sessionTarget:"main"` jobs always enqueue `payload.kind:"systemEvent"` (directly or derived from `agentTurn` results; see below).
- `sessionTarget:"isolated"` jobs create/use a stable session key: `cron:<jobId>`.
## Storage location
Cron persists everything under `~/.clawdis/cron/`:
- Job store: `~/.clawdis/cron/jobs.json`
- Run history: `~/.clawdis/cron/runs/<jobId>.jsonl`
You can override the job store path via `cron.store` in config.
The scheduler should never require additional configuration for the base directory (Clawdis already treats `~/.clawdis` as fixed).
## Enabling
2025-12-13 03:49:29 +00:00
Cron execution is enabled by default inside the Gateway.
To disable it, set:
```json5
{
cron: {
2025-12-13 03:49:29 +00:00
enabled: false,
// optional:
store: "~/.clawdis/cron/jobs.json",
maxConcurrentRuns: 1
}
}
```
2025-12-13 03:49:29 +00:00
You can also disable scheduling via the environment variable `CLAWDIS_SKIP_CRON=1`.
## Scheduler design
### Ownership
The Gateway owns:
- the scheduler timer,
- job store reads/writes,
- job execution (enqueue system events and/or agent turns).
This keeps scheduling unified with the always-on process and prevents “two schedulers” when multiple CLIs run.
### Timer strategy
- Maintain an in-memory heap/array of enabled jobs keyed by `state.nextRunAtMs`.
- Use a **single `setTimeout`** to wake at the earliest next run.
- On wake:
- compute all due jobs (now >= nextRunAtMs),
- mark them “in flight” (in memory),
- persist updated `state` (at least bump `nextRunAtMs` / `lastRunAtMs`) before starting execution to minimize duplicate runs on crash,
- execute jobs (with concurrency limits),
- persist final `lastStatus/lastError/lastDurationMs`,
- re-arm timer for the next earliest run.
### Schedule parsing
V1 can ship with `at` + `every` without extra deps.
If we add `"kind":"cron"`:
- Use a well-maintained parser (we use `croner`) and support:
- 5-field cron (`min hour dom mon dow`) at minimum
- optional `tz`
- Store `nextRunAtMs` computed by the parser; re-compute after each run.
## Execution semantics
### Main session jobs
Main session jobs do not run the agent directly by default.
When due:
1) `enqueueSystemEvent(job.payload.text)` (or a derived message)
2) If `wakeMode:"now"`, trigger an immediate heartbeat run (see “Heartbeat wake hook”).
3) Otherwise do nothing else (the next scheduled heartbeat will pick up the system event).
Why: This keeps the main sessions “proactive” behavior centralized in the heartbeat rules and avoids ad-hoc agent turns that might fight with inbound message processing.
### Isolated session jobs
Isolated jobs run an agent turn in a dedicated session key, intended to be separate from main.
When due:
- Build a message body that includes schedule metadata, e.g.:
- `"[cron:<jobId>] <job.name>: <payload.message>"`
- Execute via the same agent runner path as other command-mode runs, but pinned to:
- `sessionKey = cron:<jobId>`
- `sessionId = store[sessionKey].sessionId` (create if missing)
- Optionally deliver output (`payload.deliver === true`) to the configured channel/to.
- Isolated jobs always enqueue a summary system event to the main session when they finish (derived from the last agent text output).
- Prefix defaults to `Cron`, and can be customized via `isolation.postToMainPrefix`.
2025-12-13 12:09:20 +00:00
- If `deliver` is omitted/false, nothing is sent to external providers; you still get the main-session summary and can inspect the full isolated transcript in `cron:<jobId>`.
### “Run in parallel to main”
Clawdis currently serializes command execution through a global in-process queue (`src/process/command-queue.ts`) to avoid collisions.
To support isolated cron jobs running “in parallel”, we should introduce **lanes** (keyed queues) plus a global concurrency cap:
- Lane `"main"`: inbound auto-replies + main heartbeat.
- Lane `"cron"` (or `cron:<jobId>`): isolated jobs.
- Configurable `cron.maxConcurrentRuns` (default 1 or 2).
This yields:
- isolated jobs can overlap with the main lane (up to cap),
- each lane still preserves ordering for its own work (optional),
- we retain safety knobs to prevent runaway resource contention.
## Heartbeat wake hook (immediate vs next heartbeat)
We need a way for the Gateway (or the scheduler) to request an immediate heartbeat without duplicating heartbeat logic.
Design:
2025-12-26 02:35:21 +01:00
- `startHeartbeatRunner` owns the real heartbeat execution and installs a wake handler.
- Wake hook lives in `src/infra/heartbeat-wake.ts`:
- `setHeartbeatWakeHandler(fn | null)` installed by the heartbeat runner
- `requestHeartbeatNow({ reason, coalesceMs? })`
- If the handler is absent, the request is stored as “pending”; the next time the handler is installed, it runs once.
- Coalesce rapid calls and respect the “skip when queue busy” behavior (retry soon vs dropping).
## Run history log (JSONL)
In addition to normal structured logs, the Gateway writes an append-only run history “ledger” (JSONL) whenever a job finishes. This is intended for quick debugging (“did the job run, when, and what happened?”).
Path rules:
- Run logs are stored per job next to the store: `.../runs/<jobId>.jsonl`.
Retention:
- Best-effort pruning when the file grows beyond ~2MB; keep the newest ~2000 lines.
2025-12-13 12:09:20 +00:00
Each log line includes (at minimum) job id, status/error, timing, and a `summary` string (systemEvent text for main jobs, and the last agent text output for isolated jobs).
## Gateway API
New methods (names can be bikeshed; `cron.*` is suggested):
- `wake`
- params: `{ mode: "now" | "next-heartbeat", text: string }`
- effect: `enqueueSystemEvent(text)`, plus optional immediate heartbeat trigger
- `cron.list`
- params: optional `{ includeDisabled?: boolean }`
- returns: `{ jobs: CronJob[] }`
- `cron.add`
- params: job payload without `id/state` (server generates and returns created job)
- `cron.update`
- params: `{ id: string, patch: Partial<CronJobWritableFields> }`
- `cron.remove`
- params: `{ id: string }`
- `cron.run`
- params: `{ id: string, mode?: "due" | "force" }` (debugging; does not change schedule unless `force` requires it)
- `cron.runs`
- params: `{ id: string, limit?: number }`
- returns: `{ entries: CronRunLogEntry[] }`
- note: `id` is required (runs are stored per-job).
The Gateway should broadcast a `cron` event for UI/debug:
- event: `cron`
- payload: `{ jobId, action: "added"|"updated"|"removed"|"started"|"finished", status?, error?, nextRunAtMs? }`
## CLI surface
Add a `cron` command group (all commands should also support `--json` where sensible):
- `clawdis cron list [--json] [--all]`
- `clawdis cron add ...`
- schedule flags:
- `--at <iso8601|ms|relative>` (one-shot)
- `--every <duration>` (e.g. `10m`, `1h`)
- `--cron "<expr>" [--tz "<tz>"]`
- target flags:
- `--session main|isolated`
- `--wake now|next-heartbeat`
- payload flags (choose one):
- `--system-event "<text>"`
2025-12-15 10:11:18 -06:00
- `--message "<agent message>" [--deliver] [--channel last|whatsapp|telegram|discord] [--to <dest>]`
- `clawdis cron edit <id> ...` (patch-by-flags, non-interactive)
- `clawdis cron rm <id>`
- `clawdis cron enable <id>` / `clawdis cron disable <id>`
- `clawdis cron run <id> [--force]` (debug)
- `clawdis cron runs --id <id> [--limit <n>]` (run history)
- `clawdis cron status` (scheduler enabled + next wake)
Additionally:
- `clawdis wake --mode now|next-heartbeat --text "<text>"` as a thin wrapper around `wake` for agents to call.
## Examples
### Run once at a specific time
One-shot reminder that targets the main session and triggers a heartbeat immediately at the scheduled time:
```bash
clawdis cron add \
--at "2025-12-14T07:00:00-08:00" \
--session main \
--wake now \
--system-event "Alarm: wake up (meeting in 30 minutes)."
```
### Run daily (calendar-accurate)
Daily at 07:00 in a specific timezone (preferred over “every 24h” to avoid DST drift):
```bash
clawdis cron add \
--cron "0 7 * * *" \
--tz "America/Los_Angeles" \
--session isolated \
--wake now \
--message "Daily check: scan calendar + inbox; deliver only if urgent." \
--deliver \
--channel last
```
### Run weekly (every Wednesday)
Every Wednesday at 09:00:
```bash
clawdis cron add \
--cron "0 9 * * 3" \
--tz "America/Los_Angeles" \
--session isolated \
--wake now \
--message "Weekly: summarize status and remind me of goals." \
--deliver \
--channel last
```
### “Next heartbeat”
Enqueue a note for the main session but let the existing heartbeat cadence pick it up:
```bash
clawdis wake --mode next-heartbeat --text "Next heartbeat: check battery + upcoming meetings."
```
## Logging & observability
Logging requirements:
- Use `getChildLogger({ module: "cron", jobId, runId, name })` for every run.
- Log lifecycle:
- store load/save (debug; include job count)
- schedule recompute (debug; include nextRunAt)
- job start/end (info)
- job skipped (info; include reason)
- job error (warn; include error + stack where available)
- Emit a concise user-facing line to stdout when running in CLI mode (similar to heartbeat logs).
Suggested log events:
- `cron: scheduler started` (jobCount, nextWakeAt)
- `cron: job started` (jobId, scheduleKind, sessionTarget, wakeMode)
- `cron: job finished` (status, durationMs, nextRunAtMs)
- When `cron.enabled` is false, the Gateway logs `cron: disabled` and jobs will not run automatically (the CLI warns on `cron add`/`cron edit`).
- Use `clawdis cron status` to confirm the scheduler is enabled and see the next wake time.
## Safety & security
- Respect existing allowlists/routing rules: delivery defaults should not send to arbitrary destinations unless explicitly configured.
- Provide a global “kill switch”:
2025-12-13 03:49:29 +00:00
- `cron.enabled: boolean` (default `true`).
- `gateway method set-heartbeats` already exists; cron should have similar.
- Avoid persistence of sensitive payloads unless requested; job text may contain private content.
## Testing plan (v1)
- Unit tests:
- schedule computation for `at` and `every`
- job store read/write + migration behavior
- lane concurrency: main vs cron overlap is bounded
- “wake now” coalescing and pending behavior when provider not ready
- Integration tests:
- start Gateway with `CLAWDIS_SKIP_PROVIDERS=1`, add jobs, list/edit/remove
- simulate due jobs and assert `enqueueSystemEvent` called + cron events broadcast
## Rollout plan
1) Add the `wake` primitive + heartbeat wake hook (no persistent jobs yet).
2) Add `cron.*` API and CLI wrappers with `at` + `every`.
3) Add optional cron expression parsing (`kind:"cron"`) if needed.
4) Add UI surfacing in WebChat/macOS app (optional).