fix(agents): handle overloaded failover separately (#38301)
* fix(agents): skip auth-profile failure on overload * fix(agents): note overload auth-profile fallback fix * fix(agents): classify overloaded failures separately * fix(agents): back off before overload failover * fix(agents): tighten overload probe and backoff state * fix(agents): persist overloaded cooldown across runs * fix(agents): tighten overloaded status handling * test(agents): add overload regression coverage * fix(agents): restore runner imports after rebase * test(agents): add overload fallback integration coverage * fix(agents): harden overloaded failover abort handling * test(agents): tighten overload classifier coverage * test(agents): cover all-overloaded fallback exhaustion * fix(cron): retry overloaded fallback summaries * fix(cron): treat HTTP 529 as overloaded retry
This commit is contained in:
@@ -370,6 +370,7 @@ When a job fails, OpenClaw classifies errors as **transient** (retryable) or **p
|
||||
### Transient errors (retried)
|
||||
|
||||
- Rate limit (429, too many requests, resource exhausted)
|
||||
- Provider overload (for example Anthropic `529 overloaded_error`, overload fallback summaries)
|
||||
- Network errors (timeout, ECONNRESET, fetch failed, socket)
|
||||
- Server errors (5xx)
|
||||
- Cloudflare-related errors
|
||||
@@ -407,7 +408,7 @@ Configure `cron.retry` to override these defaults (see [Configuration](/automati
|
||||
retry: {
|
||||
maxAttempts: 3,
|
||||
backoffMs: [60000, 120000, 300000],
|
||||
retryOn: ["rate_limit", "network", "server_error"],
|
||||
retryOn: ["rate_limit", "overloaded", "network", "server_error"],
|
||||
},
|
||||
webhook: "https://example.invalid/legacy", // deprecated fallback for stored notify:true jobs
|
||||
webhookToken: "replace-with-dedicated-webhook-token", // optional bearer token for webhook mode
|
||||
@@ -665,7 +666,7 @@ openclaw system event --mode now --text "Next heartbeat: check battery."
|
||||
- OpenClaw applies exponential retry backoff for recurring jobs after consecutive errors:
|
||||
30s, 1m, 5m, 15m, then 60m between retries.
|
||||
- Backoff resets automatically after the next successful run.
|
||||
- One-shot (`at`) jobs retry transient errors (rate limit, network, server_error) up to 3 times with backoff; permanent errors disable immediately. See [Retry policy](/automation/cron-jobs#retry-policy).
|
||||
- One-shot (`at`) jobs retry transient errors (rate limit, overloaded, network, server_error) up to 3 times with backoff; permanent errors disable immediately. See [Retry policy](/automation/cron-jobs#retry-policy).
|
||||
|
||||
### Telegram delivers to the wrong place
|
||||
|
||||
|
||||
Reference in New Issue
Block a user