openclaw

Author	SHA1	Message	Date
cpojer	d0cb8c19b2	chore: wtf.	2026-02-17 13:36:48 +09:00
Sebastian	ed11e93cf2	chore(format)	2026-02-16 23:20:16 -05:00
cpojer	90ef2d6bdf	chore: Update formatting.	2026-02-17 09:18:40 +09:00
Peter Steinberger	9bfd3ca195	refactor(memory): consolidate embeddings and batch helpers	2026-02-17 00:11:02 +00:00
Rodrigo Uroz	7f1712c1ba	(fix): enforce embedding model token limit to prevent overflow (#13455 ) * fix: enforce embedding model token limit to prevent 8192 overflow - Replace EMBEDDING_APPROX_CHARS_PER_TOKEN=1 with UTF-8 byte length estimation (safe upper bound for tokenizer output) - Add EMBEDDING_MODEL_MAX_TOKENS=8192 hard cap - Add splitChunkToTokenLimit() that binary-searches for the largest safe split point, with surrogate pair handling - Add enforceChunkTokenLimit() wrapper called in indexFile() after chunkMarkdown(), before any embedding API call - Fixes: session files with large JSONL entries could produce chunks exceeding text-embedding-3-small's 8192 token limit Tests: 2 new colocated tests in manager.embedding-token-limit.test.ts - Verifies oversized ASCII chunks are split to <=8192 bytes each - Verifies multibyte (emoji) content batching respects byte limits * fix: make embedding token limit provider-aware - Add optional maxInputTokens to EmbeddingProvider interface - Each provider (openai, gemini, voyage) reports its own limit - Known-limits map as fallback: openai 8192, gemini 2048, voyage 32K - Resolution: provider field > known map > default 8192 - Backward compatible: local/llama uses fallback * fix: enforce embedding input size limits (#13455) (thanks @rodrigouroz) --------- Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>	2026-02-10 20:10:17 -06:00
max	a1123dd9be	Centralize date/time formatting utilities (#11831 )	2026-02-08 04:53:31 -08:00
Jake	e78ae48e69	fix(memory): add input_type to Voyage AI embeddings for improved retrieval (#10818 ) * fix(memory): add input_type to Voyage AI embeddings for improved retrieval Voyage AI recommends passing input_type='document' when indexing and input_type='query' when searching. This improves retrieval quality by optimising the embedding space for each direction. Changes: - embedQuery now passes input_type: 'query' - embedBatch now passes input_type: 'document' - Batch API request_params includes input_type: 'document' - Tests updated to verify input_type is passed correctly * Changelog: note Voyage embeddings input_type fix (#10818) (thanks @mcinteerj) --------- Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>	2026-02-06 21:55:09 -06:00
Jake	6965a2cc9d	feat(memory): native Voyage AI support (#7078 ) * feat(memory): add native Voyage AI embedding support with batching Cherry-picked from PR #2519, resolved conflict in memory-search.ts (hasRemote -> hasRemoteConfig rename + added voyage provider) * fix(memory): optimize voyage batch memory usage with streaming and deduplicate code Cherry-picked from PR #2519. Fixed lint error: changed this.runWithConcurrency to use imported runWithConcurrency function after extraction to internal.ts	2026-02-06 15:09:32 -06:00

8 Commits