Commit Graph

483 Commits

Author SHA1 Message Date
Peter Steinberger
386b811ddd test(cron): relax concurrent start race timeout 2026-03-08 13:44:10 +00:00
gambletan
8a20f51460 fix: add rate limit patterns for 'too many tokens' and 'tokens per day' (#39377)
Merged via squash.

Prepared head SHA: 132a45728694053c0e3220e7d861508524f17244
Co-authored-by: gambletan <266203672+gambletan@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-08 13:03:33 +03:00
Peter Steinberger
149ae45bad fix(cron): preserve manual timeoutSeconds on add 2026-03-08 00:48:57 +00:00
Peter Steinberger
e66c418c45 refactor(cron): normalize legacy delivery at ingress 2026-03-08 00:48:57 +00:00
Peter Steinberger
9b99787c31 refactor(cron): extract delivery tool policy helpers 2026-03-08 00:48:57 +00:00
Peter Steinberger
45d3e62f50 refactor(cron): extract agent defaults merge helpers 2026-03-08 00:48:56 +00:00
Peter Steinberger
6b18ec479c refactor(cron): centralize initial delivery defaults 2026-03-08 00:48:56 +00:00
Peter Steinberger
4e07bdbdfd fix(cron): restore isolated delivery defaults 2026-03-08 00:18:45 +00:00
Peter Steinberger
92f5a2e252 fix(models): refresh gpt/gemini alias defaults (#38638, thanks @ademczuk)
Co-authored-by: ademczuk <andrew.demczuk@gmail.com>
2026-03-07 21:10:58 +00:00
Tyler Yust
e554c59aac fix(cron): eliminate double-announce and replace delivery polling with push-based flow (#39089)
* fix(cron): eliminate double-announce and replace delivery polling with push-based flow

- Set deliveryAttempted=true in announce early-return paths (active-subagent
  suppression and stale-interim suppression) so the heartbeat timer no longer
  fires a redundant enqueueSystemEvent fallback (double-announce bug).

- Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC
  calls instead of a 500ms busy-poll loop.  Each active descendant run is now
  awaited concurrently via Promise.allSettled, and only a short bounded grace
  period (5s) remains to capture the cron agent's post-orchestration synthesis.
  Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time.

- Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep
  the grace-period tests instant in CI.

- Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour
  (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits).

* fix: prep cron double-announce followup tests (#39089) (thanks @tyler6204)
2026-03-07 12:13:37 -08:00
Peter Steinberger
c5bb6db85b refactor(cron): share isolated-agent turn core test setup 2026-03-07 17:58:31 +00:00
Peter Steinberger
41e0c35b61 refactor(cron): reuse cron job builder in issue-13992 tests 2026-03-07 17:58:31 +00:00
Peter Steinberger
8fd043abac refactor(cron): dedupe interim retry fallback assertions 2026-03-07 17:05:23 +00:00
Peter Steinberger
d01cb7b65f refactor(cron): share cron schedule resolver 2026-03-07 17:05:23 +00:00
Peter Steinberger
3c71e2bd48 refactor(core): extract shared dedup helpers 2026-03-07 10:41:05 +00:00
拐爷&&老拐瘦
2e31aead39 fix(gateway): invalidate bootstrap cache on session rollover (openclaw#38535)
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: yfge <1186273+yfge@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-06 23:46:02 -06:00
Altay
6e962d8b9e fix(agents): handle overloaded failover separately (#38301)
* fix(agents): skip auth-profile failure on overload

* fix(agents): note overload auth-profile fallback fix

* fix(agents): classify overloaded failures separately

* fix(agents): back off before overload failover

* fix(agents): tighten overload probe and backoff state

* fix(agents): persist overloaded cooldown across runs

* fix(agents): tighten overloaded status handling

* test(agents): add overload regression coverage

* fix(agents): restore runner imports after rebase

* test(agents): add overload fallback integration coverage

* fix(agents): harden overloaded failover abort handling

* test(agents): tighten overload classifier coverage

* test(agents): cover all-overloaded fallback exhaustion

* fix(cron): retry overloaded fallback summaries

* fix(cron): treat HTTP 529 as overloaded retry
2026-03-07 01:42:11 +03:00
Josh Lehman
fee91fefce feature(context): extend plugin system to support custom context management (#22201)
* feat(context-engine): add ContextEngine interface and registry

Introduce the pluggable ContextEngine abstraction that allows external
plugins to register custom context management strategies.

- ContextEngine interface with lifecycle methods: bootstrap, ingest,
  ingestBatch, afterTurn, assemble, compact, prepareSubagentSpawn,
  onSubagentEnded, dispose
- Module-level singleton registry with registerContextEngine() and
  resolveContextEngine() (config-driven slot selection)
- LegacyContextEngine: pass-through implementation wrapping existing
  compaction behavior for 100% backward compatibility
- ensureContextEnginesInitialized() guard for safe one-time registration
- 19 tests covering contract, registry, resolution, and legacy parity

* feat(plugins): add context-engine slot and registerContextEngine API

Wire the ContextEngine abstraction into the plugin system so external
plugins can register context engines via the standard plugin API.

- Add 'context-engine' to PluginKind union type
- Add 'contextEngine' slot to PluginSlotsConfig (default: 'legacy')
- Wire registerContextEngine() through OpenClawPluginApi
- Export ContextEngine types from plugin-sdk for external consumers
- Restore proper slot-based resolution in registry

* feat(context-engine): wire ContextEngine into agent run lifecycle

Integrate the ContextEngine abstraction into the core agent run path:

- Resolve context engine once per run (reused across retries)
- Bootstrap: hydrate canonical store from session file on first run
- Assemble: route context assembly through pluggable engine
- Auto-compaction guard: disable built-in auto-compaction when
  the engine declares ownsCompaction (prevents double-compaction)
- AfterTurn: post-turn lifecycle hook for ingest + background
  compaction decisions
- Overflow compaction: route through contextEngine.compact()
- Dispose: clean up engine resources in finally block
- Notify context engine on subagent lifecycle events

Legacy engine: all lifecycle methods are pass-through/no-op, preserving
100% backward compatibility for users without a context engine plugin.

* feat(plugins): add scoped subagent methods and gateway request scope

Expose runtime.subagent.{run, waitForRun, getSession, deleteSession}
so external plugins can spawn sub-agent sessions without raw gateway
dispatch access.

Uses AsyncLocalStorage request-scope bridge to dispatch internally via
handleGatewayRequest with a synthetic operator client. Methods are only
available during gateway request handling.

- Symbol.for-backed global singleton for cross-module-reload safety
- Fallback gateway context for non-WS dispatch paths (Telegram/WhatsApp)
- Set gateway request scope for all handlers, not just plugin handlers
- 3 staleness tests for fallback context hardening

* feat(context-engine): route /compact and sessions.get through context engine

Wire the /compact command and sessions.get handler through the pluggable
ContextEngine interface.

- Thread tokenBudget and force parameters to context engine compact
- Route /compact through contextEngine.compact() when registered
- Wire sessions.get as runtime alias for plugin subagent dispatch
- Add .pebbles/ to .gitignore

* style: format with oxfmt 0.33.0

Fix duplicate import (ControlUiRootState in server.impl.ts) and
import ordering across all changed files.

* fix: update extension test mocks for context-engine types

Add missing subagent property to bluebubbles PluginRuntime mock.
Add missing registerContextEngine to lobster OpenClawPluginApi mock.

* fix(subagents): keep deferred delete cleanup retryable

* style: format run attempt for CI

* fix(rebase): remove duplicate embedded-run imports

* test: add missing gateway context mock export

* fix: pass resolved auth profile into afterTurn compaction

Ensure the embedded runner forwards resolved auth profile context into
legacy context-engine compaction params on the normal afterTurn path,
matching overflow compaction behavior. This allows downstream LCM
summarization to use the intended provider auth/profile consistently.

Also fix strict TS typing in external-link token dedupe and align an
attempt unit test reasoningLevel value with the current ReasoningLevel
enum.

Regeneration-Prompt: |
  We were debugging context-engine compaction where downstream summary
  calls were missing the right auth/profile context in normal afterTurn
  flow, while overflow compaction already propagated it. Preserve current
  behavior and keep changes additive: thread the resolved authProfileId
  through run -> attempt -> legacy compaction param builder without
  broad refactors.

  Add tests that prove the auth profile is included in afterTurn legacy
  params and that overflow compaction still passes it through run
  attempts. Keep existing APIs stable, and only adjust small type issues
  needed for strict compilation.

* fix: remove duplicate imports from rebase

* feat: add context-engine system prompt additions

* fix(rebase): dedupe attempt import declarations

* test: fix fetch mock typing in ollama autodiscovery

* fix(test): add registerContextEngine to diffs extension mock APIs

* test(windows): use path.delimiter in ios-team-id fixture PATH

* test(cron): add model formatting and precedence edge case tests

Covers:
- Provider/model string splitting (whitespace, nested paths, empty segments)
- Provider normalization (casing, aliases like bedrock→amazon-bedrock)
- Anthropic model alias normalization (opus-4.5→claude-opus-4-5)
- Precedence: job payload > session override > config default
- Sequential runs with different providers (CI flake regression pattern)
- forceNew session preserving stored model overrides
- Whitespace/empty model string edge cases
- Config model as string vs object format

* test(cron): fix model formatting test config types

* test(phone-control): add registerContextEngine to mock API

* fix: re-export ChannelKind from config-reload-plan

* fix: add subagent mock to plugin-runtime-mock test util

* docs: add changelog fragment for context engine PR #22201
2026-03-06 05:31:59 -08:00
Vincent Koc
44ec3e4111 Cron: stabilize runs-one-shot migration tests 2026-03-06 01:27:23 -05:00
Vincent Koc
a622aee45a Cron: migrate legacy provider delivery hints 2026-03-06 01:27:23 -05:00
aerelune
0e2bc588c4 fix: enforce 600 perms for cron store and run logs (#36078)
* fix: enforce secure permissions for cron store and run logs

* fix(cron): enforce dir perms and gate posix tests on windows

* Cron store tests: cover existing directory permission hardening

* Cron run-log tests: cover existing directory permission hardening

* Changelog: note cron file permission hardening

---------

Co-authored-by: linhey <linhey@mini.local>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-06 00:48:35 -05:00
Vignesh Natarajan
d45353f95b fix(agents): honor explicit rate-limit cooldown probes in fallback runs 2026-03-05 20:03:06 -08:00
Tyler Yust
81b93b9ce0 fix(subagents): announce delivery with descendant gating, frozen result refresh, and cron retry (#35080)
Thanks @tyler6204
2026-03-05 19:20:24 -08:00
Jacob Riff
aad372e15f feat: append UTC time alongside local time in shared Current time lines (#32423)
Merged via squash.

Prepared head SHA: 9e8ec13933b5317e7cff3f0bc048de515826c31a
Co-authored-by: jriff <50276+jriff@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-06 01:26:34 +03:00
Tak Hoffman
9741e91a64 test(cron): add cross-channel announce fallback regression coverage (openclaw#36197)
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check (fails on pre-existing origin/main lint debt in extensions/mattermost imports)
- pnpm test:macmini

Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-05 07:37:37 -06:00
Tak Hoffman
544abc927f fix(cron): restore direct fallback after announce failure in best-effort mode (openclaw#36177)
Verified:
- pnpm build
- pnpm check (fails on pre-existing origin/main lint debt in extensions/mattermost imports)
- pnpm test:macmini

Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-05 07:25:24 -06:00
Tak Hoffman
cc5dad81bc cron: unify stale-run recovery and preserve manual-run every anchors (#35363)
* cron: unify stale-run recovery and preserve manual every anchors

* cron: address unresolved review threads on recovery paths

* cron: remove duplicate timestamp helper after rebase
2026-03-04 22:12:32 -06:00
Tak Hoffman
28dc2e8a40 cron: narrow startup replay backoff guard (#35391) 2026-03-04 22:11:11 -06:00
Tak Hoffman
79d00ae398 fix(cron): stabilize restart catch-up replay semantics (#35351)
* Cron: stabilize restart catch-up replay semantics

* Cron: respect backoff in startup missed-run replay
2026-03-04 21:50:16 -06:00
sline
1059b406a8 fix: cron backup should preserve pre-edit snapshot (#35195) (#35234)
* fix(cron): avoid overwriting .bak during normalization

Fixes openclaw/openclaw#35195

* test(cron): preserve pre-edit bak snapshot in normalization path

---------

Co-authored-by: 0xsline <sline@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-04 21:46:27 -06:00
Sid
8b8167d547 fix(agents): bypass pendingDescendantRuns guard for cron announce delivery (#35185)
* fix(agents): bypass pendingDescendantRuns guard for cron announce delivery

Standalone cron job completions were blocked from direct channel delivery
when the cron run had spawned subagents that were still registered as
pending. The pendingDescendantRuns guard exists for live orchestration
coordination and should not apply to fire-and-forget cron announce sends.

Thread the announceType through the delivery chain and skip both the
child-descendant and requester-descendant pending-run guards when the
announce originates from a cron job.

Closes #34966

* fix: ensure outbound session entry for cron announce with named agents (#32432)

Named agents may not have a session entry for their delivery target,
causing the announce flow to silently fail (delivered=false, no error).

Two fixes:
1. Call ensureOutboundSessionEntry when resolving the cron announce
   session key so downstream delivery can find channel metadata.
2. Fall back to direct outbound delivery when announce delivery fails
   to ensure cron output reaches the target channel.

Closes #32432

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: guard announce direct-delivery fallback against suppression leaks (#32432)

The `!delivered` fallback condition was too broad — it caught intentional
suppressions (active subagents, interim messages, SILENT_REPLY_TOKEN) in
addition to actual announce delivery failures.  Add an
`announceDeliveryWasAttempted` flag so the direct-delivery fallback only
fires when `runSubagentAnnounceFlow` was actually called and failed.

Also remove the redundant `if (route)` guard in
`resolveCronAnnounceSessionKey` since `resolved` being truthy guarantees
`route` is non-null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cron): harden announce synthesis follow-ups

---------

Co-authored-by: scoootscooob <zhentongfan@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-04 21:31:33 -06:00
Madoka
63ce7c74bd fix(feishu): comprehensive reply mechanism — outbound replyToId forwarding + topic-aware reply targeting (#33789)
* fix(feishu): comprehensive reply mechanism fix — outbound replyToId forwarding + topic-aware reply targeting

- Forward replyToId from ChannelOutboundContext through sendText/sendMedia
  to sendMessageFeishu/sendMarkdownCardFeishu/sendMediaFeishu, enabling
  reply-to-message via the message tool.

- Fix group reply targeting: use ctx.messageId (triggering message) in
  normal groups to prevent silent topic thread creation (#32980). Preserve
  ctx.rootId targeting for topic-mode groups (group_topic/group_topic_sender)
  and groups with explicit replyInThread config.

- Add regression tests for both fixes.

Fixes #32980
Fixes #32958
Related #19784

* fix: normalize Feishu delivery.to before comparing with messaging tool targets

- Add normalizeDeliveryTarget helper to strip user:/chat: prefixes for Feishu
- Apply normalization in matchesMessagingToolDeliveryTarget before comparison
- This ensures cron duplicate suppression works when session uses prefixed targets
  (user:ou_xxx) but messaging tool extract uses normalized bare IDs (ou_xxx)

Fixes review comment on PR #32755

(cherry picked from commit fc20106f16ccc88a5f02e58922bb7b7999fe9dcd)

* fix(feishu): catch thrown SDK errors for withdrawn reply targets

The Feishu Lark SDK can throw exceptions (SDK errors with .code or
AxiosErrors with .response.data.code) for withdrawn/deleted reply
targets, in addition to returning error codes in the response object.

Wrap reply calls in sendMessageFeishu and sendCardFeishu with
try-catch to handle thrown withdrawn/not-found errors (230011,
231003) and fall back to client.im.message.create, matching the
existing response-level fallback behavior.

Also extract sendFallbackDirect helper to deduplicate the
direct-send fallback block across both functions.

Closes #33496

(cherry picked from commit ad0901aec103a2c52f186686cfaf5f8ba54b4a48)

* feishu: forward outbound reply target context

(cherry picked from commit c129a691fcf552a1cebe1e8a22ea8611ffc3b377)

* feishu extension: tighten reply target fallback semantics

(cherry picked from commit f85ec610f267020b66713c09e648ec004b2e26f1)

* fix(feishu): align synthesized fallback typing and changelog attribution

* test(feishu): cover group_topic_sender reply targeting

---------

Co-authored-by: Xu Zimo <xuzimojimmy@163.com>
Co-authored-by: Munem Hashmi <munem.hashmi@gmail.com>
Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-03-04 20:32:28 -06:00
Gustavo Madeira Santana
e4b4486a96 Agent: unify bootstrap truncation warning handling (#32769)
Merged via squash.

Prepared head SHA: 5d6d4ddfa620011e267d892b402751847d5ac0c3
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-03-03 16:28:38 -05:00
Peter Steinberger
5193189953 refactor(tests): dedupe cron store migration setup 2026-03-03 01:54:27 +00:00
Peter Steinberger
fbb88d5063 refactor(tests): dedupe isolated agent cron turn assertions 2026-03-03 01:54:27 +00:00
Peter Steinberger
61f29830bc fix(test): resolve upstream typing drift in feishu and cron suites 2026-03-03 01:44:21 +00:00
Peter Steinberger
47736e3432 refactor(test): extract cron issue-regression harness and frozen-time helper 2026-03-03 01:44:21 +00:00
David Rudduck
11e1363d2d feat(hooks): add trigger and channelId to plugin hook agent context (#28623)
* feat(hooks): add trigger and channelId to plugin hook agent context

Adds `trigger` and `channelId` fields to `PluginHookAgentContext` so
plugins can determine what initiated the agent run and which channel
it originated from, without session-key parsing or Redis bridging.

trigger values: "user", "heartbeat", "cron", "memory"
channelId values: "telegram", "discord", "whatsapp", etc.

Both fields are threaded through run.ts and attempt.ts hookCtx so all
hook phases receive them (before_model_resolve, before_prompt_build,
before_agent_start, llm_input, llm_output, agent_end).

channelId falls back from messageChannel to messageProvider when the
former is not set. followup-runner passes originatingChannel so queued
followup runs also carry channel context.

* docs(changelog): note hook context parity fix for #28623

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-02 17:39:20 -08:00
Peter Steinberger
0f5f20ee6b refactor(tests): dedupe cron delivered status assertions 2026-03-03 01:37:12 +00:00
HCL
586f057c24 fix(cron): let resolveOutboundTarget handle missing delivery target fallback
The cron delivery path short-circuits with an error when `toCandidate` is
falsy (line 151), before reaching `resolveOutboundTarget()` which provides
the `plugin.config.resolveDefaultTo()` fallback. The direct send path in
`targets.ts` already uses this fallback correctly.

Remove the early `!toCandidate` exit so that `resolveOutboundTarget()`
can attempt the plugin-provided default. Guard the WhatsApp allowFrom
override against falsy `toCandidate` to maintain existing behavior when
a target IS resolved.

Fixes #32355

Signed-off-by: HCL <chenglunhu@gmail.com>
2026-03-03 01:09:47 +00:00
Peter Steinberger
d7bafae387 perf(test): trim fixture and serialization overhead in integration suites 2026-03-03 01:09:07 +00:00
Peter Steinberger
282b107e99 test(perf): speed up cron, memory, and secrets hotspots 2026-03-03 00:43:01 +00:00
Peter Steinberger
6a42d09129 refactor: dedupe gateway config and infra flows 2026-03-03 00:15:14 +00:00
ningding97
4d19dc8671 test(cron): assert embedded model on last call to avoid bun ordering flake
Bun runs can trigger multiple embedded agent invocations in a single cron
turn (e.g. retries/fallbacks), making assertions against call[0] flaky.
Assert against the last invocation instead.
2026-03-02 23:36:13 +00:00
Peter Steinberger
f9cbcfca0d refactor: modularize slack/config/cron/daemon internals 2026-03-02 22:30:21 +00:00
Peter Steinberger
7253e91300 fix: strengthen cron heartbeat multi-payload suppression (#32131) (thanks @adhishthite) 2026-03-02 22:16:18 +00:00
Adhish
2330c71b63 fix(cron): suppress delivery when multi-payload response contains HEARTBEAT_OK
When a cron agent emits multiple text payloads (narration + tool
summaries) followed by a final HEARTBEAT_OK, the delivery suppression
check `isHeartbeatOnlyResponse` fails because it uses `.every()` —
requiring ALL payloads to be heartbeat tokens. In practice, agents
narrate their work before signaling nothing needs attention.

Fix: check if ANY payload contains HEARTBEAT_OK (`.some()`) while
preserving the media delivery exception (if any payload has media,
always deliver). This matches the semantic intent: HEARTBEAT_OK is
the agent's explicit signal that nothing needs user attention.

Real-world example: heartbeat agent returns 3 payloads:
1. "It's 12:49 AM — quiet hours. Let me run the checks quickly."
2. "Emails: Just 2 calendar invites. Not urgent."
3. "HEARTBEAT_OK"

Previously: all 3 delivered to Telegram. Now: correctly suppressed.

Related: #32013 (fixed a different HEARTBEAT_OK leak path via system
events in timer.ts)
2026-03-02 22:16:18 +00:00
Peter Steinberger
91dd89313a refactor(core): dedupe command, hook, and cron fixtures 2026-03-02 21:31:36 +00:00
Peter Steinberger
f7765bc151 perf(cron): cache schedule evaluators and stagger offsets 2026-03-02 20:19:10 +00:00
Peter Steinberger
b1c30f0ba9 refactor: dedupe cli config cron and install flows 2026-03-02 19:57:33 +00:00