* fix: detect PID recycling in session write lock staleness check
The session lock uses isPidAlive() to determine if a lock holder is
still running. In containers, PID recycling can cause a different
process to inherit the same PID, making the lock appear valid when
the original holder is dead.
Record the process start time (field 22 of /proc/pid/stat) in the
lock file and compare it during staleness checks. If the PID is alive
but its start time differs from the recorded value, the lock is
treated as stale and reclaimed immediately.
Backward compatible: lock files without starttime are handled with
the existing PID-alive + age-based logic. Non-Linux platforms skip
the starttime check entirely (getProcessStartTime returns null).
* shared: harden pid starttime parsing
* sessions: validate lock pid/starttime payloads
* changelog: note recycled PID lock recovery fix
* changelog: credit hiroki and vincent on lock recovery fix
---------
Co-authored-by: HirokiKobayashi-R <hiroki@rhems-japan.co.jp>
When a model API call hangs indefinitely (e.g. Anthropic quota exceeded
mid-call), the gateway acquires a session .jsonl.lock but the promise
never resolves, so the try/finally block never reaches release(). Since
the owning PID is the gateway itself, stale detection cannot help —
isPidAlive() always returns true.
This commit adds four layers of defense:
1. **In-process lock watchdog** (session-write-lock.ts)
- Track acquiredAt timestamp on each held lock
- 60-second interval timer checks all held locks
- Auto-releases any lock held longer than maxHoldMs (default 5 min)
- Catches the hung-API-call case that try/finally cannot
2. **Gateway startup cleanup** (server-startup.ts)
- On boot, scan all agent session directories for *.jsonl.lock files
- Remove locks with dead PIDs or older than staleMs (30 min)
- Log each cleaned lock for diagnostics
3. **openclaw doctor stale lock detection** (doctor-session-locks.ts)
- New health check scans for .jsonl.lock files
- Reports PID status and age of each lock found
- In --fix mode, removes stale locks automatically
4. **Transcript error entry on API failure** (attempt.ts)
- When promptError is set, write an error marker to the session
transcript before releasing the lock
- Preserves conversation history even on model API failures
Closes#18060
Adds cleanup handlers to release held file locks when the process
terminates via SIGTERM, SIGINT, or normal exit. This prevents orphaned
lock files that would block future sessions.
Fixes#1951
What:
- stub resolvePinnedHostname in web-fetch tests to avoid DNS flake
- close lock file handles via FileHandle.close during cleanup to avoid EBADF
Why:
- make CI deterministic without network/DNS dependence
- prevent double-close errors from GC
Tests:
- pnpm vitest run --config vitest.unit.config.ts src/agents/tools/web-tools.fetch.test.ts src/agents/session-write-lock.test.ts (failed: missing @aws-sdk/client-bedrock)
Adds process exit handlers to release all held session locks on:
- Normal process.exit() calls
- SIGTERM / SIGINT signals
This ensures locks are cleaned up even when the process terminates
unexpectedly, preventing the 'session file locked' error.