2025-12-29 23:21:05 +01:00
---
summary: "Talk mode: continuous speech conversations with ElevenLabs TTS"
read_when:
- Implementing Talk mode on macOS/iOS/Android
- Changing voice/TTS/interrupt behavior
2026-01-31 16:04:03 -05:00
title: "Talk Mode"
2025-12-29 23:21:05 +01:00
---
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
# Talk Mode
Talk mode is a continuous voice conversation loop:
2026-01-31 21:13:13 +09:00
1. Listen for speech
2. Send transcript to the model (main session, chat.send)
3. Wait for the response
4. Speak it via ElevenLabs (streaming playback)
2025-12-29 23:21:05 +01:00
## Behavior (macOS)
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
- **Always-on overlay** while Talk mode is enabled.
- **Listening → Thinking → Speaking** phase transitions.
- On a **short pause** (silence window), the current transcript is sent.
- Replies are **written to WebChat** (same as typing).
- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
## Voice directives in replies
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
The assistant may prefix its reply with a **single JSON line** to control voice:
```json
2026-01-31 21:13:13 +09:00
{ "voice": "< voice-id > ", "once": true }
2025-12-29 23:21:05 +01:00
```
Rules:
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
- First non-empty line only.
- Unknown keys are ignored.
- `once: true` applies to the current reply only.
- Without `once` , the voice becomes the new default for Talk mode.
- The JSON line is stripped before TTS playback.
Supported keys:
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
- `voice` / `voice_id` / `voiceId`
- `model` / `model_id` / `modelId`
- `speed` , `rate` (WPM), `stability` , `similarity` , `style` , `speakerBoost`
- `seed` , `normalize` , `lang` , `output_format` , `latency_tier`
- `once`
2026-01-30 03:15:10 +01:00
## Config (`~/.openclaw/openclaw.json`)
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
```json5
{
2026-01-31 21:13:13 +09:00
talk: {
voiceId: "elevenlabs_voice_id",
modelId: "eleven_v3",
outputFormat: "mp3_44100_128",
apiKey: "elevenlabs_api_key",
interruptOnSpeech: true,
},
2025-12-29 23:21:05 +01:00
}
```
Defaults:
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
- `interruptOnSpeech` : true
2025-12-30 12:17:40 +01:00
- `voiceId` : falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)
- `modelId` : defaults to `eleven_v3` when unset
2025-12-30 01:57:45 +01:00
- `apiKey` : falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
2025-12-30 12:52:56 +01:00
- `outputFormat` : defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
2025-12-29 23:21:05 +01:00
## macOS UI
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
- Menu bar toggle: **Talk**
- Config tab: **Talk Mode** group (voice id + interrupt toggle)
- Overlay:
- **Listening**: cloud pulses with mic level
- **Thinking**: sinking animation
- **Speaking**: radiating rings
- Click cloud: stop speaking
- Click X: exit Talk mode
## Notes
2026-01-31 21:13:13 +09:00
2025-12-29 23:21:05 +01:00
- Requires Speech + Microphone permissions.
- Uses `chat.send` against session key `main` .
2025-12-30 12:17:40 +01:00
- TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.
- `stability` for `eleven_v3` is validated to `0.0` , `0.5` , or `1.0` ; other models accept `0..1` .
- `latency_tier` is validated to `0..4` when set.
2025-12-30 12:52:56 +01:00
- Android supports `pcm_16000` , `pcm_22050` , `pcm_24000` , and `pcm_44100` output formats for low-latency AudioTrack streaming.