2025-12-09 17:51:05 +00:00
---
summary: "Voice wake and push-to-talk modes plus routing details in the mac app"
read_when:
- Working on voice wake or PTT pathways
---
2025-12-08 17:23:52 +01:00
# Voice Wake & Push-to-Talk
2025-12-08 01:35:42 +01:00
Updated: 2025-12-08 · Owners: mac app
2025-12-08 17:23:52 +01:00
## Modes
- **Wake-word mode** (default): always-on Speech recognizer waits for trigger tokens (`swabbleTriggerWords` ). On match it starts capture, shows the overlay with partial text, and auto-sends after silence.
2025-12-08 23:50:19 +01:00
- **Push-to-talk (Right Option hold)**: hold the right Option key to capture immediately—no trigger needed. The overlay appears while held; releasing finalizes and forwards after a short delay so you can tweak text.
2025-12-08 01:35:42 +01:00
2025-12-08 17:23:52 +01:00
## Runtime behavior (wake-word)
- Speech recognizer lives in `VoiceWakeRuntime` .
- Silence windows: 2.0s when speech is flowing, 5.0s if only the trigger was heard.
- Hard stop: 120s to prevent runaway sessions.
- Debounce between sessions: 350ms.
- Overlay is driven via `VoiceWakeOverlayController` with committed/volatile coloring.
- After send, recognizer restarts cleanly to listen for the next trigger.
2025-12-08 01:35:42 +01:00
2025-12-08 17:23:52 +01:00
## Push-to-talk specifics
2025-12-08 23:50:19 +01:00
- Hotkey detection uses a global `.flagsChanged` monitor for **right Option** (`keyCode 61` + `.option` ). We only observe events (no swallowing).
2025-12-08 17:23:52 +01:00
- Capture pipeline lives in `VoicePushToTalk` : starts Speech immediately, streams partials to the overlay, and calls `VoiceWakeForwarder` on release.
- When push-to-talk starts we pause the wake-word runtime to avoid dueling audio taps; it restarts automatically after release.
2025-12-08 23:50:19 +01:00
- Permissions: requires Microphone + Speech; seeing events needs Accessibility/Input Monitoring approval.
- External keyboards: some may not expose right Option as expected—offer a fallback shortcut if users report misses.
2025-12-08 17:23:52 +01:00
## User-facing settings
- **Voice Wake** toggle: enables wake-word runtime.
- **Hold Cmd+Fn to talk**: enables the push-to-talk monitor. Disabled on macOS < 26.
- Language & mic pickers, live level meter, trigger-word table, tester, forward target/command all remain unchanged.
2025-12-08 20:50:34 +01:00
- **Sounds**: chimes on trigger detect and on send; defaults to the macOS “Glass” system sound. You can pick any `NSSound` -loadable file (e.g. MP3/WAV/AIFF) for each event or choose **No Sound** .
2025-12-08 01:35:42 +01:00
2025-12-08 17:23:52 +01:00
## Forwarding payload
- `VoiceWakeForwarder.prefixedTranscript(_:)` prepends the machine hint before sending. Shared between wake-word and push-to-talk paths.
2025-12-08 01:35:42 +01:00
2025-12-08 17:23:52 +01:00
## Quick verification
- Toggle push-to-talk on, hold Cmd+Fn, speak, release: overlay should show partials then send.
- While holding, menu-bar ears should stay enlarged (uses `triggerVoiceEars(ttl:nil)` ); they drop after release.