Models
mold supports 11 model families spanning different architectures, quality levels, and VRAM requirements — including both image and video generation.
Choosing a Model
| Need | Recommended | Why |
|---|---|---|
| Fast iterations | flux2-klein:q8 | 4 steps, ungated, Apache 2.0 |
| Best quality | flux-dev:q4 | 25 steps, excellent detail |
| Low VRAM (<8 GB) | flux2-klein:q4 | 2.6 GB, 4 steps |
| Classic ecosystem | sd15:fp16 or dreamshaper-v8 | Huge model library, ControlNet |
| Fast + great | z-image-turbo:q8 | 9 steps, excellent quality |
| SDXL | sdxl-turbo:fp16 | 4 steps, 1024x1024 |
| Video | ltx-video-0.9.6-distilled:bf16 | Text-to-video, 30fps, APNG/MP4, best-supported default |
| Audio + video | ltx-2-19b-distilled:fp8 | Joint audio-video, MP4-first, advanced conditioning |
VRAM Guide
| Model | Variant | Approx. VRAM | Speed | Quality |
|---|---|---|---|---|
flux-schnell:q8 | Q8 | ~12 GB | Fast, 4 steps | Good |
flux-schnell:q6 | Q6 | ~14 GB | Fast, 4 steps | Better than Q8 |
flux-dev:q4 | Q4 | ~8 GB | Slow, 25 steps | Excellent |
flux-dev:q6 | Q6 | ~10 GB | Slow, 25 steps | Best FLUX quality/size trade |
flux-dev:bf16 | BF16 | ~24 GB | Slow, 25 steps | Best FLUX quality |
flux2-klein:q4 | Q4 | ~4 GB | Fast, 4 steps | Good for very small GPUs |
z-image-turbo:q8 | Q8 | ~10 GB | Fast, 9 steps | Excellent |
sdxl-turbo:fp16 | FP16 | ~10 GB | Very fast, 4 steps | Good |
sd15:fp16 | FP16 | ~6 GB | Medium, 25 steps | Good, broad ecosystem |
qwen-image:q4 | Q4 | ~14 GB | Slow, 50 steps | Good, stable at 1024x1024 |
qwen-image-2512:q4 | Q4 | ~14 GB | Slow, 50 steps | Good, stable at 1024x1024 |
qwen-image:q8 | Q8 | ~22 GB | Slow, 50 steps | Best GGUF, validated at 768 |
ltx-video-0.9.6-distilled:bf16 | BF16 | ~10 GB | Fast, 8 steps | Video, low-VRAM default |
ltx-video-0.9.8-2b-distilled:bf16 | BF16 | ~10-12 GB | Fast, 7+3 steps | Newer video checkpoint, multiscale refine |
ltx-2-19b-distilled:fp8 | FP8 | ~24 GB | Slow, 8 steps | Joint audio-video, recommended LTX-2 |
ltx-2.3-22b-distilled:fp8 | FP8 | ~24 GB | Slow, 8 steps | Larger joint audio-video path |
VRAM estimates include the transformer, text encoder(s), VAE, and ~2 GB activation headroom. The default column is sequential mode (drop-and-reload), which loads components one at a time. Eager mode keeps everything on GPU simultaneously for faster inference but needs more VRAM.
| Model | Variant | Default VRAM | Eager VRAM | Speed | Quality |
|---|---|---|---|---|---|
flux-schnell:q8 | Q8 | ~15 GB | ~25 GB | Fast, 4 steps | Good |
flux-dev:q4 | Q4 | ~10 GB | ~15 GB | Slow, 25 steps | Excellent |
flux-dev:q6 | Q6 | ~12 GB | ~20 GB | Slow, 25 steps | Best FLUX quality/size trade |
flux-dev:bf16 | BF16 | ~26 GB | ~36 GB | Slow, 25 steps | Best FLUX quality |
flux2-klein:q4 | Q4 | ~5 GB | ~11 GB | Fast, 4 steps | Good for very small GPUs |
flux2-klein:q8 | Q8 | ~6 GB | ~13 GB | Fast, 4 steps | Good |
z-image-turbo:q8 | Q8 | ~9 GB | ~13 GB | Fast, 9 steps | Excellent |
sdxl-turbo:fp16 | FP16 | ~8 GB | ~11 GB | Very fast, 4 steps | Good |
sd15:fp16 | FP16 | ~6 GB | ~6 GB | Medium, 25 steps | Good, broad ecosystem |
sd35-large:q8 | Q8 | ~12 GB | ~22 GB | Medium, 28 steps | Excellent |
qwen-image:q4 | Q4 | ~14 GB | ~22 GB | Slow, 50 steps | Good, validated at 1024 |
qwen-image-2512:q4 | Q4 | ~14 GB | ~22 GB | Slow, 50 steps | Good, validated at 1024 |
qwen-image:q8 | Q8 | ~22 GB | ~24+ GB | Slow, 50 steps | Best GGUF, validated at 768 |
Sequential vs Eager
In sequential mode (the default), mold loads each component (encoder → transformer → VAE) one at a time, freeing GPU memory between phases. This reduces peak VRAM by 30-50% but adds 10-20% to generation time.
Use --eager to keep all components loaded simultaneously for faster inference on high-VRAM cards. FLUX.1 also supports --offload for block-level CPU↔GPU streaming (~4-5 GB peak, 2-4x slower).







Model Management
mold pull flux2-klein:q8 # Download a model
mold list # See what you have
mold info # Installation overview
mold info flux-dev:q4 # Model details + disk usage
mold rm dreamshaper-v8 # Remove a model
mold default flux-dev:q4 # Set default modelName Resolution
Bare names auto-resolve by trying :q8 → :fp16 → :bf16 → :fp8:
mold run flux2-klein "a cat" # resolves to flux2-klein:q8
mold run sdxl-base "a cat" # resolves to sdxl-base:fp16HuggingFace Auth
Some model repos (marked [gated]) require a HuggingFace access token. You may need to accept the model's license on its HuggingFace page before downloading.
Option 1 — Environment variable (simplest):
export HF_TOKEN=hf_...
mold pull flux-dev:q4Option 2 — HuggingFace CLI (persists the token):
# Install the HF CLI
curl -LsSf https://hf.co/cli/install.sh | bash
# Log in (saves token to ~/.cache/huggingface/)
hf auth loginOnce logged in, mold pull picks up the stored token automatically — no HF_TOKEN export needed.
See the HuggingFace CLI docs for more options.
All Families
| Family | Native Resolution | Architecture |
|---|---|---|
| FLUX.2 | 1024x1024 | Qwen3 encoder, 4B transformer |
| FLUX.1 | 1024x1024 | Flow-matching transformer |
| SDXL | 1024x1024 | Dual-CLIP, UNet |
| SD 1.5 | 512x512 | CLIP-L, UNet |
| SD 3.5 | 1024x1024 | Triple encoder, MMDiT |
| Z-Image | 1024x1024 | Qwen3 encoder, 3D RoPE |
| Wuerstchen | 1024x1024 | 3-stage cascade, 42x compress |
| Qwen-Image | 1328x1328 | Qwen2.5-VL, flow-matching, CFG |
| Qwen-Image-Edit | Derived from first edit image | Qwen2.5-VL multimodal edit, flow-matching, CFG |
| LTX-2 | 1216x704 | Gemma 3, joint audio-video transformer |
| LTX Video | 768x512 | T5-XXL, DiT, 3D causal VAE |
Each family page lists recommended dimensions for non-square aspect ratios. Using non-recommended dimensions will trigger a warning.
Backend compatibility
All image families and LTX Video run on CUDA, Apple Metal, and CPU. LTX-2 / LTX-2.3 is CUDA-only for real generation — its CPU path exists for correctness-oriented coverage and can be extremely slow, and Metal is not supported for this family in this release.
