2025-12-09 17:51:05 +00:00
---
summary: "Image and media handling rules for send, gateway, and agent replies"
read_when:
- Modifying media pipeline or attachments
---
2025-12-05 19:04:09 +00:00
# Image & Media Support — 2025-12-05
2025-11-25 04:58:31 +01:00
2026-01-10 05:14:09 +01:00
Clawdbot is now **web-only** (Baileys). This document captures the current media handling rules for send, gateway, and agent replies.
2025-11-25 04:58:31 +01:00
## Goals
2026-01-09 08:59:54 +01:00
- Send media with optional captions via `clawdbot message send --media` .
2025-12-05 19:04:09 +00:00
- Allow auto-replies from the web inbox to include media alongside text.
- Keep per-type limits sane and predictable.
2025-11-25 04:58:31 +01:00
2025-12-05 19:04:09 +00:00
## CLI Surface
2026-01-09 08:59:54 +01:00
- `clawdbot message send --media <path-or-url> [--message <caption>]`
2025-12-05 19:04:09 +00:00
- `--media` optional; caption can be empty for media-only sends.
- `--dry-run` prints the resolved payload; `--json` emits `{ provider, to, messageId, mediaUrl, caption }` .
2025-11-25 04:58:31 +01:00
2025-12-05 19:04:09 +00:00
## Web Provider Behavior
2025-11-25 04:58:31 +01:00
- Input: local file path **or** HTTP(S) URL.
2025-12-05 19:04:09 +00:00
- Flow: load into a Buffer, detect media kind, and build the correct payload:
2026-01-09 12:44:23 +00:00
- **Images:** resize & recompress to JPEG (max side 2048px) targeting `agents.defaults.mediaMaxMb` (default 5 MB), capped at 6 MB.
2025-12-05 19:04:09 +00:00
- **Audio/Voice/Video:** pass-through up to 16 MB; audio is sent as a voice note (`ptt: true` ).
- **Documents:** anything else, up to 100 MB, with filename preserved when available.
2026-01-03 23:56:36 +00:00
- WhatsApp GIF-style playback: send an MP4 with `gifPlayback: true` (CLI: `--gif-playback` ) so mobile clients loop inline.
2025-12-05 19:04:09 +00:00
- MIME detection prefers magic bytes, then headers, then file extension.
- Caption comes from `--message` or `reply.text` ; empty caption is allowed.
- Logging: non-verbose shows `↩️` /`✅` ; verbose includes size and source path/URL.
2025-11-25 04:58:31 +01:00
## Auto-Reply Pipeline
2025-12-05 19:04:09 +00:00
- `getReplyFromConfig` returns `{ text?, mediaUrl?, mediaUrls? }` .
2026-01-09 08:59:54 +01:00
- When media is present, the web sender resolves local paths or URLs using the same pipeline as `clawdbot message send` .
2025-12-05 19:04:09 +00:00
- Multiple media entries are sent sequentially if provided.
2025-11-25 04:58:31 +01:00
2025-12-13 13:25:49 +00:00
## Inbound Media to Commands (Pi)
2026-01-10 05:14:09 +01:00
- When inbound web messages include media, Clawdbot downloads to a temp file and exposes templating variables:
2025-12-05 19:04:09 +00:00
- `{{MediaUrl}}` pseudo-URL for the inbound media.
2025-11-25 04:58:31 +01:00
- `{{MediaPath}}` local temp path written before running the command.
2026-01-05 06:37:12 +01:00
- When a per-session Docker sandbox is enabled, inbound media is copied into the sandbox workspace and `MediaPath` /`MediaUrl` are rewritten to a relative path like `media/inbound/<filename>` .
2026-01-11 01:51:07 +01:00
- Audio transcription (if configured via `tools.audio.transcription` ) runs before templating and can replace `Body` with the transcript.
2025-11-25 04:58:31 +01:00
2025-12-05 19:04:09 +00:00
## Limits & Errors
- Images: ~6 MB cap after recompression.
- Audio/voice/video: 16 MB cap; documents: 100 MB cap.
- Oversize or unreadable media → clear error in logs and the reply is skipped.
2025-11-25 04:58:31 +01:00
2025-12-05 19:04:09 +00:00
## Notes for Tests
- Cover send + reply flows for image/audio/document cases.
- Validate recompression for images (size bound) and voice-note flag for audio.
- Ensure multi-media replies fan out as sequential sends.