Skip to content

Alternative Providers

By default Claudette spawns the official claude CLI authenticated against Anthropic. Behind the Alternative Claude Code backends experimental flag (the UI keeps that wording for the toggle) you can also point agents at:

  • Ollama — local LLMs running on your own machine, talking Claude’s wire format directly.
  • LM Studio — local LLMs loaded in LM Studio’s desktop app, served via LM Studio 0.4.1+‘s native /v1/messages. Routed through the in-process gateway with an Anthropic-shape pass-through (no translation, just status-code correction on errors).
  • OpenAI & Codexgpt-* models via the OpenAI API, or your ChatGPT subscription via the codex CLI’s auth, both routed through the in-process gateway with full Anthropic ↔ OpenAI Responses translation.

The data model also defines CustomAnthropic and CustomOpenAi kinds for self-hosted / proxied endpoints, but the Settings panel currently only renders the five built-in backends — there is no in-GUI flow to add a custom provider yet.

This is experimental because some Claude-specific features (extended thinking, effort levels, fast mode, the 1M-context auto-upgrade) only have meaningful equivalents on the official Anthropic backend.

  1. Open Settings > Experimental and toggle on Alternative Claude Code backends.
  2. Open Settings > Models and configure each provider you want to use (URL, secret, models).
  3. Pick a default in Settings > Models > Default backend, or override per session from the chat header.

When the feature is off, only the built-in Anthropic backend is exposed — every workspace runs claude as before.

IDKindDefault base URLAuthSetup
anthropicAnthropic(uses claude CLI)Inherits from Claude CodeAuthentication
ollamaOllamahttp://localhost:11434None (or optional bearer token)Ollama
lm-studioLmStudiohttp://localhost:1234None (or optional bearer token)LM Studio
openai-apiOpenAiApihttps://api.openai.comAPI key (required)OpenAI & Codex
codex-subscriptionCodexSubscriptionhttps://chatgpt.com/backend-apicodex login (managed by codex CLI)OpenAI & Codex

Claudette splits providers into two architectural categories:

Direct (Anthropic, Ollama, CustomAnthropic) — speak Claude’s wire format natively and return errors with HTTP status codes the Anthropic SDK already classifies correctly. The spawned claude CLI just gets a different base URL and auth token; there is no in-process translation layer.

Gateway (OpenAiApi, CodexSubscription, CustomOpenAi, LmStudio) — Claudette spawns a tiny in-process HTTP listener that the spawned claude process is pointed at instead of api.anthropic.com. The gateway has two flavors of work depending on backend kind:

  • OpenAI-Responses translation (OpenAI / Codex / CustomOpenAi): full Anthropic ↔ OpenAI /v1/responses translation in both directions, since these backends don’t speak Anthropic.
  • Anthropic pass-through (LM Studio): forward the Anthropic body verbatim to upstream /v1/messages, stream 2xx responses through unchanged, intercept non-2xx to fix LM Studio’s HTTP 500 mis-classification of context-overflow errors. Wire format is already correct; we only translate status codes on the error path.

Heads up if you ever configure a custom OpenAI endpoint: Claudette posts to /v1/responses, not /v1/chat/completions. Providers that only implement Chat Completions won’t work behind the gateway.

The per-turn capabilities differ by provider (AgentBackendCapabilities in src/agent_backend.rs). The chat-header toggles for unsupported capabilities are hidden / disabled automatically:

CapabilityAnthropicOllamaLM StudioGateway (OpenAI / Codex)
Extended thinking✅ (when model supports)
Effort levels
Fast mode
1M-context auto-upgrade
Tool use✅ (model-dependent)
Vision✅ (model-dependent)

LM Studio is on the same direct path as Ollama, but its capability profile mirrors the gateway providers’ (no extended-thinking / effort / fast / 1M toggles) — local models implement those Anthropic-specific knobs inconsistently or not at all, so the chat header hides them. Switching a session to any non-Anthropic provider visibly disables the toggles that don’t apply.

For gateway providers, Claudette spins up an HTTP listener on 127.0.0.1:0 (random port) per provider and per model. It mints a random auth token for that listener, then exports three env vars into the spawned claude subprocess:

  • ANTHROPIC_BASE_URL — the local gateway URL.
  • ANTHROPIC_AUTH_TOKEN — the auth token. Cached for the gateway server’s lifetime (keyed by backend id + runtime hash) and reused across turns; rotates only when the gateway restarts (e.g., after config or model changes).
  • CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 — tells the CLI to query the gateway for available models.

The gateway translates Claude’s request into the OpenAI Responses API (POST /v1/responses for OpenAI; POST /codex/responses for the Codex subscription path — the unversioned form is intentional, since the upstream Codex Responses endpoint lives directly under https://chatgpt.com/backend-api/codex/responses with no /v1/ segment) and translates the streaming response back. It’s an in-process tokio task; no external service.

If the provider config or model changes, the gateway restarts on the next agent turn. There’s no manual lifecycle to manage.

The model picker in the chat header is scoped to the active provider. Switching the provider resets the model to that provider’s default.

Per-session model selection is persisted in app_settings as model:<session_id> and model_provider:<session_id>. Disabling the experimental flag triggers a cleanup pass: only sessions whose persisted model or provider was actually pointing at a non-Anthropic backend (or a non-built-in Claude model) get rewritten to the defaults anthropic / opus — sessions already on a built-in Claude model are left alone. The cleanup writes new values rather than deleting keys, and resets the live agent on rewritten sessions so the next turn starts fresh against Claude.

  • Default Claude Code (anthropic) — best feature parity. Keep this unless you have a specific reason to switch.
  • Ollama — air-gapped work, privacy-sensitive code, or offline travel. Quality varies wildly by model.
  • LM Studio — same air-gapped use cases as Ollama, but if you already manage your local model library through LM Studio’s desktop app and want to keep it as the single source of truth.
  • OpenAI API — if you specifically want gpt-* or OpenAI reasoning models, or have prepaid OpenAI credit you’d rather burn than your Claude quota.
  • Codex — if you’re already paying for ChatGPT Plus/Pro/Team and want to reuse that quota.
  • The Anthropic-compatible providers should accept the full Claude Code request shape; not all proxies and self-hosted gateways implement everything (especially the system prompt format and tool-use protocol). If a provider silently drops fields, agents may behave oddly.
  • Gateway providers translate tool calls and vision payloads, but lossy edge cases exist around streaming partial messages. Test on a small turn before running long agent sessions on a new provider.
  • The Usage panel (Settings > Usage) only reads Anthropic subscription telemetry. Token consumption on alternative providers is whatever the upstream reports, not visible inside Claudette.