Agent Providers
By default Claudette spawns the official claude CLI authenticated against Anthropic. In Settings > Models, you can also point agents at:
- Ollama — local LLMs running on your own machine, talking Claude’s wire format directly.
- LM Studio — local LLMs loaded in LM Studio’s desktop app, served via LM Studio 0.4.1+‘s native
/v1/messages. Routed through the in-process gateway with an Anthropic-shape pass-through (no translation, just status-code correction on errors). - OpenAI —
gpt-*models via the OpenAI API gateway. - Codex — ChatGPT subscription-backed Codex through the native
codex app-serverharness. - Pi — models/providers discovered from Pi’s SDK, executed through Claudette’s bundled Pi sidecar with Claudette permission mediation.
On startup Claudette quietly checks for local providers. If it finds the Codex CLI, an Ollama daemon on localhost:11434, an LM Studio server on localhost:1234, or Pi models reachable through the bundled Pi SDK sidecar, that provider is auto-enabled and its model list is refreshed. Turning a local provider off in Settings records an opt-out, so Claudette will not re-enable it on the next launch. OpenAI API stays manual because it requires an API key and a remote account. Pi’s startup probe is heavier than the localhost probes (it cold-starts the Bun sidecar), but it’s what keeps the chat-header model picker populated with Pi’s discovered models without forcing the user to open Settings.
The data model also defines CustomAnthropic and CustomOpenAi kinds for self-hosted / proxied endpoints, but the Settings panel currently only renders the built-in backends — there is no in-GUI flow to add a custom provider yet.
Some Claude-specific features only have clean equivalents on particular backends. Claudette hides or disables unsupported controls instead of sending ignored settings.
Enabling Providers
Section titled “Enabling Providers”- Open Settings > Models.
- Keep Agent providers on to show Ollama, LM Studio, OpenAI API, and custom providers.
- Keep Codex on to show Codex and seed Codex models into the picker.
- Pi is always shown as a first-class harness; run
pi authand refresh Pi models to populate it. - Configure each provider you want to use (URL, secret, models).
- Pick a default in Settings > Models > Default model, or override per session from the chat header.
When both provider gates are off, the built-in Anthropic and first-class Pi harnesses remain exposed. Ollama, LM Studio, OpenAI API, custom backends, and native Codex stay hidden behind their respective gates.
At a glance
Section titled “At a glance”| Provider | Kind | Default base URL | Auth | Setup |
|---|---|---|---|---|
| Claude Code | Anthropic | (uses claude CLI) | Inherits from Claude Code | Authentication |
| Ollama | Ollama | http://localhost:11434 | None (or optional bearer token) | Ollama |
| LM Studio | LmStudio | http://localhost:1234 | None (or optional bearer token) | LM Studio |
| OpenAI API | OpenAiApi | https://api.openai.com | API key (required) | OpenAI & Codex |
| Codex | CodexNative | (uses codex app-server) | codex login (managed by codex CLI) | OpenAI & Codex |
| Pi | PiSdk | (uses bundled Pi sidecar) | pi auth / Pi provider config | Pi |
Two execution paths: direct vs gateway
Section titled “Two execution paths: direct vs gateway”Claudette splits providers into two architectural categories:
Direct Claude wire format (Anthropic, Ollama, CustomAnthropic) — speak Claude’s wire format natively and return errors with HTTP status codes the Anthropic SDK already classifies correctly. The spawned claude CLI just gets a different base URL and auth token; there is no in-process translation layer.
Gateway (OpenAiApi, CustomOpenAi, LmStudio, plus the legacy internal CodexSubscription path) — Claudette spawns a tiny in-process HTTP listener that the spawned claude process is pointed at instead of api.anthropic.com. The gateway has two flavors of work depending on backend kind:
- OpenAI-Responses translation (OpenAI / legacy Codex / CustomOpenAi): full Anthropic ↔ OpenAI
/v1/responsestranslation in both directions, since these backends don’t speak Anthropic. - Anthropic pass-through (LM Studio): forward the Anthropic body verbatim to upstream
/v1/messages, stream 2xx responses through unchanged, intercept non-2xx to fix LM Studio’s HTTP 500 mis-classification of context-overflow errors. Wire format is already correct; we only translate status codes on the error path.
Native harnesses (CodexNative, PiSdk) — Claudette speaks to a purpose-built subprocess instead of spawning claude. Codex uses codex app-server --listen stdio://; Pi uses Claudette’s bundled Bun sidecar embedding @earendil-works/pi-coding-agent.
Runtime override per backend
Section titled “Runtime override per backend”Some kinds support more than one harness. Settings > Models shows a Runtime select on those cards that lets you swap between the kind’s default and the alternative without leaving the panel:
- Ollama, LM Studio — default to the Pi harness (more capable agent loop than proxying the Claude CLI through
ANTHROPIC_BASE_URL). Claude CLI stays available as a fallback for users who prefer the original wire-format-passthrough behavior or hit a Pi limitation. - OpenAI API, Custom OpenAI — default stays on Claude CLI (gateway translation). Pi is opt-in.
- Codex Native — default stays on the Codex app-server. Pi is opt-in (Pi’s own OpenAI provider, requiring
pi authto be configured). - Anthropic, Custom Anthropic, Codex Subscription — locked to Claude CLI. Pi is never offered for Claude-subscription / Anthropic-OAuth flows; the resolver also enforces this at chat-send time.
The override persists as a runtime_harness field on the backend config (AgentBackendConfig::effective_harness in src/agent_backend.rs). Picking the default value clears the override so future default-policy changes pick the user back up.
Heads up if you ever configure a custom OpenAI endpoint: Claudette posts to
/v1/responses, not/v1/chat/completions. Providers that only implement Chat Completions won’t work behind the gateway.
Capability matrix
Section titled “Capability matrix”The per-turn capabilities differ by provider (AgentBackendCapabilities in src/agent_backend.rs). The chat-header toggles for unsupported capabilities are hidden / disabled automatically:
| Capability | Anthropic | Ollama | LM Studio | Gateway OpenAI | Codex | Pi |
|---|---|---|---|---|---|---|
| Extended thinking | ✅ | ✅ (when model supports) | ❌ | ❌ | ✅ (reasoning summaries) | ✅ (when model supports) |
| Reasoning / effort levels | ✅ (Claude effort) | ❌ | ❌ | ❌ | ✅ (Codex intelligence) | ✅ |
| Fast mode | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
| 1M-context auto-upgrade | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Tool use | ✅ | ✅ | ✅ (model-dependent) | ✅ | ✅ | ✅ |
| Vision | ✅ | ✅ | ✅ (model-dependent) | ✅ | ❌ | ❌ |
LM Studio is on the same direct path as Ollama, but its capability profile mirrors the gateway providers’ (no extended-thinking / effort / fast / 1M toggles) — local models implement those Anthropic-specific knobs inconsistently or not at all, so the chat header hides them. Codex exposes fast mode and Codex intelligence through Codex app-server settings. Switching a session to a provider visibly disables or relabels the toggles that don’t apply.
How the gateway runtime works
Section titled “How the gateway runtime works”For gateway providers, Claudette spins up an HTTP listener on 127.0.0.1:0 (random port) per provider and per model. It mints a random auth token for that listener, then exports three env vars into the spawned claude subprocess:
ANTHROPIC_BASE_URL— the local gateway URL.ANTHROPIC_AUTH_TOKEN— the auth token. Cached for the gateway server’s lifetime (keyed by backend id + runtime hash) and reused across turns; rotates only when the gateway restarts (e.g., after config or model changes).CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1— tells the CLI to query the gateway for available models.
The gateway translates Claude’s request into the OpenAI Responses API (POST /v1/responses for OpenAI; POST /codex/responses for the legacy Codex subscription path — the unversioned form is intentional, since the upstream Codex Responses endpoint lives directly under https://chatgpt.com/backend-api/codex/responses with no /v1/ segment) and translates the streaming response back. It’s an in-process tokio task; no external service.
If the provider config or model changes, the gateway restarts on the next agent turn. There’s no manual lifecycle to manage.
Model selection
Section titled “Model selection”The model picker in the chat header is scoped to the active provider. Switching the provider resets the model to that provider’s default.
Per-session model selection is persisted in app_settings as model:<session_id> and model_provider:<session_id>. Disabling Agent providers triggers a cleanup pass: only sessions whose persisted model or provider was actually pointing at a non-Anthropic backend (or a non-built-in Claude model) get rewritten to the defaults anthropic / opus — sessions already on a built-in Claude model are left alone. The cleanup writes new values rather than deleting keys, and resets the live agent on rewritten sessions so the next turn starts fresh against Claude.
When to choose what
Section titled “When to choose what”- Default Claude Code (
anthropic) — best feature parity. Keep this unless you have a specific reason to switch. - Ollama — air-gapped work, privacy-sensitive code, or offline travel. Quality varies wildly by model.
- LM Studio — same air-gapped use cases as Ollama, but if you already manage your local model library through LM Studio’s desktop app and want to keep it as the single source of truth.
- OpenAI API — if you specifically want
gpt-*or OpenAI reasoning models, or have prepaid OpenAI credit you’d rather burn than your Claude quota. - Codex — if you’re already paying for ChatGPT Plus/Pro/Team and want to reuse that quota through the native Codex CLI.
- Pi — if you want Pi’s provider/model catalog while keeping Claudette’s session lifecycle and approval UI.
Caveats
Section titled “Caveats”- The Anthropic-compatible providers should accept the full Claude Code request shape; not all proxies and self-hosted gateways implement everything (especially the
systemprompt format and tool-use protocol). If a provider silently drops fields, agents may behave oddly. - Gateway providers translate tool calls and vision payloads, but lossy edge cases exist around streaming partial messages. Test on a small turn before running long agent sessions on a new provider.
- The Usage panel (Settings > Usage) only reads Anthropic subscription telemetry. Token consumption on other providers is whatever the upstream reports, not visible inside Claudette.
See also
Section titled “See also”- Ollama, LM Studio, OpenAI & Codex, and Pi — per-provider setup pages
- Agent Configuration — model selection, effort, thinking, and which knobs apply per provider
- Authentication — credential handling for the default Anthropic backend