Skip to content

Models

mold supports 11 model families spanning different architectures, quality levels, and VRAM requirements — including both image and video generation.

Choosing a Model

NeedRecommendedWhy
Fast iterationsflux2-klein:q84 steps, ungated, Apache 2.0
Best qualityflux-dev:q425 steps, excellent detail
Low VRAM (<8 GB)flux2-klein:q42.6 GB, 4 steps
Classic ecosystemsd15:fp16 or dreamshaper-v8Huge model library, ControlNet
Fast + greatz-image-turbo:q89 steps, excellent quality
SDXLsdxl-turbo:fp164 steps, 1024x1024
Videoltx-video-0.9.6-distilled:bf16Text-to-video, 30fps, APNG/MP4, best-supported default
Audio + videoltx-2-19b-distilled:fp8Joint audio-video, MP4-first, advanced conditioning

VRAM Guide

ModelVariantApprox. VRAMSpeedQuality
flux-schnell:q8Q8~12 GBFast, 4 stepsGood
flux-schnell:q6Q6~14 GBFast, 4 stepsBetter than Q8
flux-dev:q4Q4~8 GBSlow, 25 stepsExcellent
flux-dev:q6Q6~10 GBSlow, 25 stepsBest FLUX quality/size trade
flux-dev:bf16BF16~24 GBSlow, 25 stepsBest FLUX quality
flux2-klein:q4Q4~4 GBFast, 4 stepsGood for very small GPUs
z-image-turbo:q8Q8~10 GBFast, 9 stepsExcellent
sdxl-turbo:fp16FP16~10 GBVery fast, 4 stepsGood
sd15:fp16FP16~6 GBMedium, 25 stepsGood, broad ecosystem
qwen-image:q4Q4~14 GBSlow, 50 stepsGood, stable at 1024x1024
qwen-image-2512:q4Q4~14 GBSlow, 50 stepsGood, stable at 1024x1024
qwen-image:q8Q8~22 GBSlow, 50 stepsBest GGUF, validated at 768
ltx-video-0.9.6-distilled:bf16BF16~10 GBFast, 8 stepsVideo, low-VRAM default
ltx-video-0.9.8-2b-distilled:bf16BF16~10-12 GBFast, 7+3 stepsNewer video checkpoint, multiscale refine
ltx-2-19b-distilled:fp8FP8~24 GBSlow, 8 stepsJoint audio-video, recommended LTX-2
ltx-2.3-22b-distilled:fp8FP8~24 GBSlow, 8 stepsLarger joint audio-video path

VRAM estimates include the transformer, text encoder(s), VAE, and ~2 GB activation headroom. The default column is sequential mode (drop-and-reload), which loads components one at a time. Eager mode keeps everything on GPU simultaneously for faster inference but needs more VRAM.

ModelVariantDefault VRAMEager VRAMSpeedQuality
flux-schnell:q8Q8~15 GB~25 GBFast, 4 stepsGood
flux-dev:q4Q4~10 GB~15 GBSlow, 25 stepsExcellent
flux-dev:q6Q6~12 GB~20 GBSlow, 25 stepsBest FLUX quality/size trade
flux-dev:bf16BF16~26 GB~36 GBSlow, 25 stepsBest FLUX quality
flux2-klein:q4Q4~5 GB~11 GBFast, 4 stepsGood for very small GPUs
flux2-klein:q8Q8~6 GB~13 GBFast, 4 stepsGood
z-image-turbo:q8Q8~9 GB~13 GBFast, 9 stepsExcellent
sdxl-turbo:fp16FP16~8 GB~11 GBVery fast, 4 stepsGood
sd15:fp16FP16~6 GB~6 GBMedium, 25 stepsGood, broad ecosystem
sd35-large:q8Q8~12 GB~22 GBMedium, 28 stepsExcellent
qwen-image:q4Q4~14 GB~22 GBSlow, 50 stepsGood, validated at 1024
qwen-image-2512:q4Q4~14 GB~22 GBSlow, 50 stepsGood, validated at 1024
qwen-image:q8Q8~22 GB~24+ GBSlow, 50 stepsBest GGUF, validated at 768

Sequential vs Eager

In sequential mode (the default), mold loads each component (encoder → transformer → VAE) one at a time, freeing GPU memory between phases. This reduces peak VRAM by 30-50% but adds 10-20% to generation time.

Use --eager to keep all components loaded simultaneously for faster inference on high-VRAM cards. FLUX.1 also supports --offload for block-level CPU↔GPU streaming (~4-5 GB peak, 2-4x slower).

Model Management

bash
mold pull flux2-klein:q8     # Download a model
mold list                    # See what you have
mold info                    # Installation overview
mold info flux-dev:q4        # Model details + disk usage
mold rm dreamshaper-v8       # Remove a model
mold default flux-dev:q4     # Set default model

Name Resolution

Bare names auto-resolve by trying :q8:fp16:bf16:fp8:

bash
mold run flux2-klein "a cat"   # resolves to flux2-klein:q8
mold run sdxl-base "a cat"     # resolves to sdxl-base:fp16

HuggingFace Auth

Some model repos (marked [gated]) require a HuggingFace access token. You may need to accept the model's license on its HuggingFace page before downloading.

Option 1 — Environment variable (simplest):

bash
export HF_TOKEN=hf_...
mold pull flux-dev:q4

Option 2 — HuggingFace CLI (persists the token):

bash
# Install the HF CLI
curl -LsSf https://hf.co/cli/install.sh | bash

# Log in (saves token to ~/.cache/huggingface/)
hf auth login

Once logged in, mold pull picks up the stored token automatically — no HF_TOKEN export needed.

See the HuggingFace CLI docs for more options.

All Families

FamilyNative ResolutionArchitecture
FLUX.21024x1024Qwen3 encoder, 4B transformer
FLUX.11024x1024Flow-matching transformer
SDXL1024x1024Dual-CLIP, UNet
SD 1.5512x512CLIP-L, UNet
SD 3.51024x1024Triple encoder, MMDiT
Z-Image1024x1024Qwen3 encoder, 3D RoPE
Wuerstchen1024x10243-stage cascade, 42x compress
Qwen-Image1328x1328Qwen2.5-VL, flow-matching, CFG
Qwen-Image-EditDerived from first edit imageQwen2.5-VL multimodal edit, flow-matching, CFG
LTX-21216x704Gemma 3, joint audio-video transformer
LTX Video768x512T5-XXL, DiT, 3D causal VAE

Each family page lists recommended dimensions for non-square aspect ratios. Using non-recommended dimensions will trigger a warning.

Backend compatibility

All image families and LTX Video run on CUDA, Apple Metal, and CPU. LTX-2 / LTX-2.3 is CUDA-only for real generation — its CPU path exists for correctness-oriented coverage and can be extremely slow, and Metal is not supported for this family in this release.