Skip to content

Docker & RunPod

Run mold on any NVIDIA GPU host with Docker, including cloud GPU providers like RunPod.

Building

bash
docker build -t mold-server .
bash
docker build --build-arg CUDA_COMPUTE_CAP=90 -t mold-server-h100 .
bash
docker build --build-arg CUDA_COMPUTE_CAP=80 -t mold-server-a100 .
bash
docker build --build-arg CUDA_COMPUTE_CAP=120 -t mold-server-b200 .

Pre-Built Images

Images are published to GHCR on every push to main and on version tags:

bash
# Ada (RTX 4090) — default
docker pull ghcr.io/utensils/mold:latest

# Ampere (A100)
docker pull ghcr.io/utensils/mold:latest-sm80

# Blackwell (RTX 5090)
docker pull ghcr.io/utensils/mold:latest-sm120

Running

bash
docker run --gpus all -p 7680:7680 ghcr.io/utensils/mold:latest
bash
docker run --gpus all -p 7680:7680 \
  -v ~/.mold:/workspace/.mold \
  ghcr.io/utensils/mold:latest

RunPod Deployment

1. Push Your Image

bash
docker tag mold-server your-registry/mold-server
docker push your-registry/mold-server

Or use the pre-built GHCR images directly.

2. Create a Pod Template

  • Container image: ghcr.io/utensils/mold:latest
  • HTTP port: 7680
  • Attach a network volume for persistent model storage

3. Generate from Anywhere

bash
MOLD_HOST=https://<pod-id>-7680.proxy.runpod.net mold run "a cat"

Network Volume

The entrypoint auto-detects RunPod network volumes at /workspace:

  • Models persist at /workspace/.mold/models
  • HuggingFace cache at /workspace/.cache/huggingface
  • All data survives pod restarts

Environment Variables

VariableDefaultDescription
MOLD_PORT7680Server port
MOLD_LOGinfoLog level
MOLD_DEFAULT_MODELDefault model to load
MOLD_MODELS_DIROverride models path
GPUVRAM$/hrNotes
RTX 409024 GB$0.34Best value, all models work
L40S48 GB$0.40Full BF16 FLUX without offload
A100 80GB80 GB$0.79Maximum headroom

Proxy Timeout

RunPod's Cloudflare proxy has a 100-second timeout. Use the SSE streaming endpoint (/api/generate/stream) for long generations.

Image Details

The Dockerfile uses a multi-stage build:

  1. Buildernvidia/cuda:12.8.1-devel-ubuntu22.04 with Rust and cargo
  2. Runtimenvidia/cuda:12.8.1-runtime-ubuntu22.04 (~3.4 GB image, 33 MB binary)

libcuda.so.1 (the NVIDIA driver) is injected at runtime by the NVIDIA Container Toolkit — the image cannot run without GPU access.