mold runpod — native RunPod CLI
mold runpod manages RunPod cloud GPU pods end-to-end from the same binary you use for local generation. Create a pod, connect to it, stream logs, track spend, and (with mold runpod run) create-generate-save in a single command.
Compared to the Docker & RunPod guide — which shows the manual pod-creation flow via runpodctl or the web console — this guide covers the integrated workflow.
Setup
Generate a RunPod API key at runpod.io/console/user/settings (Settings → API Keys, "Read/Write" scope). Then:
# Option 1: persist to config.toml
mold config set runpod.api_key <your-key>
# Option 2: env var (overrides config)
export RUNPOD_API_KEY=<your-key>
# Verify
mold runpod doctormold runpod doctor checks the key, the REST endpoint, and your RunPod balance + spend rate.
The killer feature — mold runpod run
Generate an image on a fresh or reused pod with one command:
mold runpod run "a cinematic low-angle of a tiny steel robot"What this does:
- If a warm pod exists (created by a prior
run), reuses it. - Otherwise creates a new pod with smart defaults:
- picks the cheapest GPU with High or Medium stock (4090 → 5090 → L40S → A100),
- selects the matching
ghcr.io/utensils/moldimage tag, - retries across datacenters if scheduling stalls.
- Waits for the mold server inside the pod to be reachable, streaming a readiness progress bar.
- Calls
/api/generate/stream— SSE events drive a live progress display (model pull, weight load, denoise steps). - Saves the output to
./mold-outputs/runpod-<pod-id>-<timestamp>.png(directory auto-created,.gitignore'd by default). - Prints the proxy URL so you can open the pod's web gallery and browse past generations in a browser.
- Leaves the pod warm for reuse on the next
run. Pass--keepto leave it running explicitly, or setrunpod.auto_teardown = truein config to delete after each generation.
Web gallery
Every ghcr.io/utensils/mold image ships the Vue 3 gallery SPA at /opt/mold/web, and the server exposes it as the root route. Opening the pod's proxy URL in a browser gives you:
- Feed / grid toggle over the server's
outputdirectory. - Real thumbnails for PNG, JPEG, GIF, APNG, WebP, and MP4 (first frame).
- Full
OutputMetadataper item (prompt, seed, model, steps, LoRA stack). - Mobile-friendly swipe-through detail viewer, keyboard nav on desktop.
# Print the browsable URL for an existing pod
mold runpod connect <pod-id>
# → export MOLD_HOST=https://<pod-id>-7680.proxy.runpod.net
# → gallery: https://<pod-id>-7680.proxy.runpod.netThe delete button is hidden unless the server advertises gallery.can_delete: true via /api/capabilities — set MOLD_GALLERY_ALLOW_DELETE=1 on the pod (env var) if you want to prune from the browser. Pair that with MOLD_API_KEY so the endpoint isn't open to the public proxy.
Common flags
mold runpod run "a cat" --model flux-dev:q4 # preload a specific model
mold runpod run "a cat" --gpu 5090 # force a GPU family
mold runpod run "a cat" --dc US-IL-1 # pin a datacenter
mold runpod run "a cat" --keep # don't park pod for reuse
mold runpod run "a cat" --steps 28 --seed 42 # forward standard gen flags
mold runpod run "a cat" --output-dir ./renders # custom save pathManual pod management
# Discovery
mold runpod gpus # GPU types with aggregate stock
mold runpod gpus --json
mold runpod datacenters --gpu "RTX 5090"
mold runpod usage --since 7d # spend summary + pod history
# Lifecycle
mold runpod create --gpu 5090 --volume 50 # smart defaults fill the rest
mold runpod create --dry-run # print plan without creating
mold runpod list
mold runpod get <pod-id>
mold runpod stop <pod-id> # pause billing, keep storage
mold runpod start <pod-id> # resume
mold runpod delete <pod-id> # tear down
# Connecting
mold runpod connect <pod-id> # print export MOLD_HOST=…
eval "$(mold runpod connect <pod-id>)" # exec the export in your shell
# Observability
mold runpod logs <pod-id> # one-shot
mold runpod logs <pod-id> --follow # tail (polls every 2s)Smart defaults explained
When mold runpod create (or run) is invoked without --gpu/--dc:
RunPodClient::gpu_types()aggregates the highest stock signal per GPU across all datacenters (via GraphQL — the REST API doesn't expose this).- The cheapest family with High or Medium stock wins, from the preference list
4090 > 5090 > L40S > A100. - The image tag is derived from the GPU family:
- Ada (4090, L40S) →
:latest - Ampere (A100, 3090) →
:latest-sm80 - Blackwell (5090) →
:latest-sm120
- Ada (4090, L40S) →
- Datacenter is left unset; RunPod's scheduler picks any machine it can place. If that fails,
ensure_podretries across stock-ranked DCs with a 90-second schedule timeout per attempt, deleting stuck pods before moving on.
Configuration reference
config.toml (~/.config/mold/config.toml or ~/.mold/config.toml) supports a [runpod] section:
[runpod]
# api_key = "rpa_..." # Prefer RUNPOD_API_KEY env var
default_gpu = "RTX 5090" # Override auto-pick
default_datacenter = "EUR-IS-2"
default_network_volume_id = "nv-abc123" # Attach to every new pod
auto_teardown = false # true = delete after each `run`
auto_teardown_idle_mins = 20 # Idle reap window
cost_alert_usd = 3.0 # Abort if session spend exceeds
# endpoint = "https://rest.runpod.io/v1" # Override (mostly for testing)All keys are settable via mold config set runpod.<key> <value>.
Env-var precedence
| Variable | Purpose |
|---|---|
RUNPOD_API_KEY | Overrides config.runpod.api_key |
Other runpod settings are config-only (no env-var override) — they rarely change between runs.
State files
mold runpod persists two files under $MOLD_HOME/ (default ~/.mold/):
runpod-state.json— warm-pod pointer (last_pod_id, timestamps, cached GPU + cost). Used byrunfor reuse detection.runpod-history.jsonl— append-only log of pod lifetime events with cost and prompt metadata. Used bymold runpod usage --since <win>.
Delete these any time to reset state; they're caches, not sources of truth.
NixOS integration
The services.mold module supports a runpodApiKeyFile option:
services.mold = {
enable = true;
package = inputs.mold.packages.${system}.default;
runpodApiKeyFile = config.age.secrets.runpod-key.path;
};The key is read via ExecStartPre and injected into the service environment — never written into the Nix store. Same pattern as hfTokenFile and apiKeyFile.
REST vs GraphQL
RunPodClient hits RunPod's REST API at https://rest.runpod.io/v1/ for pod lifecycle (create/list/get/stop/start/delete/logs) and uses the GraphQL endpoint at https://api.runpod.io/graphql for account info, GPU catalog, and datacenter availability — those aren't exposed via REST.
Both paths use the same API key (Authorization: Bearer …).
Troubleshooting
"pod didn't schedule within 90s" — the datacenter likely has no real capacity despite a High/Medium stock signal. mold runpod run will automatically try the next candidate. If all fail, RunPod is out of capacity for that GPU family right now. Retry later, pick a different GPU, or fall back to a local GPU host.
"value must be one of …" from /pods — you pinned a datacenter that isn't in RunPod's REST enum whitelist. GraphQL exposes datacenters that REST doesn't accept. Omit --dc to let RunPod pick.
"RunPod /user 401" or "…403" — stale/invalid/missing API key. Run mold runpod doctor to confirm. Regenerate at runpod.io/console/user/settings.
Orphaned pods after Ctrl-C — mold runpod run persists last_pod_idbefore waiting for readiness, so mold runpod list always surfaces zombie pods. Delete with mold runpod delete <id>.
