Voice Input

Claudette lets you dictate prompts to the composer using your microphone. Four providers are available — the app picks the best one automatically, or you can pin a specific provider in Settings.

Providers

Provider	Platforms	Runs where	Notes
Apple Speech	macOS	On-device, OS-managed	Uses the native Speech framework. Fast and lightweight — no model download required. Requires Microphone and Speech Recognition permission in System Settings.
Windows Speech API (SAPI 5.4)	Windows 7+	On-device	Drives the in-process SAPI recognizer (`SpInprocRecognizer`) directly via COM — no .NET, no PowerShell, no extra runtime. Captured audio is transcribed locally with no network round-trip. If transcription returns no text, install a Speech Recognizer language pack via Settings → Time & language → Speech.
Distil-Whisper	macOS, Linux, Windows	Fully on-device	Uses the `distil-whisper/distil-large-v3` model (~1.5 GB), downloaded once from Hugging Face on first use and cached locally. All subsequent transcription is offline. Metal GPU acceleration on macOS keeps latency low; on CPU (Linux + Windows, including ARM64) inference is materially slower — for short dictation prefer Windows SAPI / Apple Speech, and the timeout ceiling is set to 5 min on CPU vs 90 s on macOS to give long clips room.
Web Speech API	Linux (fallback)	Browser-managed	Used when native capture is unavailable. Quality and language support depend on the OS browser engine. (Not used on Windows — WebView2 does not expose the Web Speech API; Windows uses SAPI instead.)

Claudette’s automatic selection order: a pinned provider (if set and enabled) → any ready local model → the platform provider (Apple Speech on macOS, Windows SAPI on Windows) → a provider that still needs setup. The mic button in the composer reflects whichever provider is active.

How to use

Click the microphone icon in the chat composer, or use a keyboard shortcut:

Shortcut	Action
`⌘⇧M` (macOS) / `Ctrl+Shift+M` (Linux/Windows)	Toggle recording on/off
`Right ⌥` (macOS)	Hold to record, release to transcribe

Hold-to-talk (Right ⌥) is macOS-only by default. Linux and Windows users can bind any key in Settings → Keyboard → Voice: Hold to talk.

While recording, a live VU meter appears next to the mic button — bar height tracks your microphone level in real time. A noise gate filters out silence, so you don’t need to click precisely around pauses.

When you stop recording, the transcription is inserted at the cursor position in the composer. You can edit it before sending.

Language support

Distil-Whisper supports 99 languages via Whisper language codes. The model auto-detects the spoken language; no per-session configuration is needed.

Apple Speech uses the system’s active language and locale settings.

Setting up

Apple Speech

macOS requires two permissions — Microphone and Speech Recognition. The first time you start recording, macOS will prompt for both. You can also grant them in advance:

System Settings → Privacy & Security → Microphone → Claudette
System Settings → Privacy & Security → Speech Recognition → Claudette

Windows Speech API (SAPI)

No download is needed — SAPI 5.4 ships in every Windows install since 7. Ensure microphone access is enabled for Claudette via Settings → Privacy & security → Microphone.

If transcription returns no text or surfaces a “Windows speech recognizer is not installed” error, install a recognizer language pack: Settings → Time & language → Speech → Add a voice → English (United States) (or your preferred locale’s recognizer). Without an installed recognizer the engine has nothing to match against.

If transcription fails for another reason, the toolbar shows a short, actionable message and the raw HRESULT + failed COM stage are written to the daily diagnostics log. Open it via Settings → Diagnostics → Open log directory for the full context.

Distil-Whisper

Open Settings → Plugins and find the Voice Input section. Click Download model next to Distil-Whisper. The download is roughly 1.5 GB and happens once — subsequent launches load the cached model from disk. The cache path is shown in the provider row.

Once the download completes, click Use to pin Distil-Whisper as the active provider.

To free up disk space, click Remove model at any time. The model will be re-downloaded if you enable the provider again.

Keyboard shortcuts

Voice hotkeys can be customised or disabled in Settings → Keyboard, under the Voice group. Both the toggle shortcut and the hold-to-talk key are independently configurable.

Configuring providers

All voice provider management lives in Settings → Plugins → Voice Input:

Enable/disable each provider independently
Select (pin) a specific provider as the preferred one
Download or remove the Distil-Whisper model
See the model size, cache path, and accelerator in use (Metal / CPU)

Privacy

Distil-Whisper performs inference entirely locally via Candle — audio never leaves the app. The model is fetched from huggingface.co/distil-whisper/distil-large-v3/ on first use; after that, no network access is needed for voice.

Apple Speech and Web Speech API may process audio off-device depending on OS and browser settings. Apple’s offline behavior varies by language support; the Web Speech API’s behavior depends on the browser engine. If strict on-device processing is required, use Distil-Whisper.