Skip to content

Voice Input

Claudette lets you dictate prompts to the composer using your microphone. Four providers are available — the app picks the best one automatically, or you can pin a specific provider in Settings.

ProviderPlatformsRuns whereNotes
Apple SpeechmacOSOn-device, OS-managedUses the native Speech framework. Fast and lightweight — no model download required. Requires Microphone and Speech Recognition permission in System Settings.
Windows Speech API (SAPI 5.4)Windows 7+On-deviceDrives the in-process SAPI recognizer (SpInprocRecognizer) directly via COM — no .NET, no PowerShell, no extra runtime. Captured audio is transcribed locally with no network round-trip. If transcription returns no text, install a Speech Recognizer language pack via Settings → Time & language → Speech.
Distil-WhispermacOS, Linux, WindowsFully on-deviceUses the distil-whisper/distil-large-v3 model (~1.5 GB), downloaded once from Hugging Face on first use and cached locally. All subsequent transcription is offline. Metal GPU acceleration on macOS keeps latency low; on CPU (Linux + Windows, including ARM64) inference is materially slower — for short dictation prefer Windows SAPI / Apple Speech, and the timeout ceiling is set to 5 min on CPU vs 90 s on macOS to give long clips room.
Web Speech APILinux (fallback)Browser-managedUsed when native capture is unavailable. Quality and language support depend on the OS browser engine. (Not used on Windows — WebView2 does not expose the Web Speech API; Windows uses SAPI instead.)

Claudette’s automatic selection order: a pinned provider (if set and enabled) → any ready local model → the platform provider (Apple Speech on macOS, Windows SAPI on Windows) → a provider that still needs setup. The mic button in the composer reflects whichever provider is active.

Click the microphone icon in the chat composer, or use a keyboard shortcut:

ShortcutAction
⌘⇧M (macOS) / Ctrl+Shift+M (Linux/Windows)Toggle recording on/off
Right ⌥ (macOS)Hold to record, release to transcribe

Hold-to-talk (Right ⌥) is macOS-only by default. Linux and Windows users can bind any key in Settings → Keyboard → Voice: Hold to talk.

While recording, a live VU meter appears next to the mic button — bar height tracks your microphone level in real time. A noise gate filters out silence, so you don’t need to click precisely around pauses.

When you stop recording, the transcription is inserted at the cursor position in the composer. You can edit it before sending.

Distil-Whisper supports 99 languages via Whisper language codes. The model auto-detects the spoken language; no per-session configuration is needed.

Apple Speech uses the system’s active language and locale settings.

macOS requires two permissions — Microphone and Speech Recognition. The first time you start recording, macOS will prompt for both. You can also grant them in advance:

  • System Settings → Privacy & Security → Microphone → Claudette
  • System Settings → Privacy & Security → Speech Recognition → Claudette

No download is needed — SAPI 5.4 ships in every Windows install since 7. Ensure microphone access is enabled for Claudette via Settings → Privacy & security → Microphone.

If transcription returns no text or surfaces a “Windows speech recognizer is not installed” error, install a recognizer language pack: Settings → Time & language → Speech → Add a voice → English (United States) (or your preferred locale’s recognizer). Without an installed recognizer the engine has nothing to match against.

If transcription fails for another reason, the toolbar shows a short, actionable message and the raw HRESULT + failed COM stage are written to the daily diagnostics log. Open it via Settings → Diagnostics → Open log directory for the full context.

Open Settings → Plugins and find the Voice Input section. Click Download model next to Distil-Whisper. The download is roughly 1.5 GB and happens once — subsequent launches load the cached model from disk. The cache path is shown in the provider row.

Once the download completes, click Use to pin Distil-Whisper as the active provider.

To free up disk space, click Remove model at any time. The model will be re-downloaded if you enable the provider again.

Voice hotkeys can be customised or disabled in Settings → Keyboard, under the Voice group. Both the toggle shortcut and the hold-to-talk key are independently configurable.

All voice provider management lives in Settings → Plugins → Voice Input:

  • Enable/disable each provider independently
  • Select (pin) a specific provider as the preferred one
  • Download or remove the Distil-Whisper model
  • See the model size, cache path, and accelerator in use (Metal / CPU)

Distil-Whisper performs inference entirely locally via Candle — audio never leaves the app. The model is fetched from huggingface.co/distil-whisper/distil-large-v3/ on first use; after that, no network access is needed for voice.

Apple Speech and Web Speech API may process audio off-device depending on OS and browser settings. Apple’s offline behavior varies by language support; the Web Speech API’s behavior depends on the browser engine. If strict on-device processing is required, use Distil-Whisper.