Skip to content

Qwen-Image

Qwen2.5-VL text encoder with a 3D causal VAE (2D temporal-slice) and flow-matching with classifier-free guidance.

Winter cabin — Qwen-Image 2512 Q4"A snowy mountain cabin at twilight, warm orange light pouring from the windows, aurora borealis in the sky above"qwen-image-2512:q4, 50 steps, seed 888

Overgrown greenhouse — Qwen-Image 2512 Q4"An abandoned greenhouse overgrown with exotic flowers and vines, cracked glass roof letting in shafts of golden light, butterflies and hummingbirds, lush and magical"qwen-image-2512:q4, 50 steps, seed 2024

Hot air balloon — Qwen-Image 2512 Q4"A colorful hot air balloon floating over a misty valley at sunrise, the balloon has the word MOLD written on the side"qwen-image-2512:q4, 50 steps, seed 314

Stable GGUF Variants

mold supports two quantized Qwen lines:

  • qwen-image:* uses the base Qwen/Qwen-Image release with GGUF transformers from city96/Qwen-Image-gguf
  • qwen-image-2512:* uses Qwen/Qwen-Image-2512 with GGUF transformers from unsloth/Qwen-Image-2512-GGUF

The Qwen-Image text encoder itself is also selectable now:

  • --qwen2-variant auto|bf16|q8|q6|q5|q4|q3|q2
  • --qwen2-text-encoder-mode auto|gpu|cpu-stage|cpu

On Apple Metal/MPS, auto prefers quantized Qwen2.5-VL GGUF text encoders (q6, then q4) to avoid the BF16 text-encoder memory spike. CUDA auto prefers BF16 when enough headroom remains after the transformer load and falls back to quantized GGUF variants when that resident encoder would be too heavy.

Base Qwen-Image

ModelStepsSizeValidated On 24 GBNotes
qwen-image:q85021.8 GB768x768Highest-quality GGUF tier
qwen-image:q65016.8 GB1024x1024Quality/size trade-off
qwen-image:q55014.9 GB1024x1024Dynamic K_M quant
qwen-image:q45013.1 GB1024x1024Stable 24 GB choice
qwen-image:q3509.7 GB1024x1024Lower bitrate, still prompt-faithful
qwen-image:q2507.1 GB1024x1024Smallest published base GGUF

Qwen-Image-2512

ModelStepsSizeValidated On 24 GBNotes
qwen-image-2512:q85021.8 GB768x768Highest-quality 2512 GGUF tier
qwen-image-2512:q65016.8 GB1024x1024Quality/size trade-off
qwen-image-2512:q55015.0 GB1024x1024Dynamic K_M quant
qwen-image-2512:q45013.2 GB1024x1024Stable 24 GB choice
qwen-image-2512:q3509.9 GB1024x1024Lower bitrate, still prompt-faithful
qwen-image-2512:q2507.3 GB1024x1024Smallest published 2512 GGUF

Qwen-Image-Edit-2511

qwen-image-edit-2511:* is the edit-family sibling of Qwen-Image. It uses repeatable --image inputs instead of img2img --strength, supports negative prompts, and targets output dimensions derived from the first input image at roughly 1024x1024 area.

ModelStepsSizeNotes
qwen-image-edit-2511:q85021.8 GBHighest-quality GGUF tier
qwen-image-edit-2511:q65016.9 GBQuality/size trade-off
qwen-image-edit-2511:q55015.0 GBDynamic K_M quant
qwen-image-edit-2511:q45013.2 GBStable 24 GB GGUF target
qwen-image-edit-2511:q3509.9 GBLower bitrate, still relatively small
qwen-image-edit-2511:q2507.5 GBSmallest published edit GGUF
qwen-image-edit-2511:bf165040.9 GBSharded BF16 edit transformer

Edit Path

qwen-image-edit-2511 runs a real multimodal edit path: Qwen2.5-VL condition images are patchified through the vision tower, source-image latents are packed and concatenated with output-noise tokens, and true CFG uses norm rescaling. Quantized --qwen2-variant values are supported for the edit family through a GGUF Qwen2.5 language path plus the staged Qwen2.5-VL vision tower used for image conditioning.

Recommended Stable Quant Paths

On a 24 GB card, qwen-image:q4 and qwen-image-2512:q4 are the safest starting points for native-quality GGUF inference. q6 and q5 also work well at 1024x1024, while q8 is currently validated at 768x768.

bash
mold pull qwen-image:q4
mold run qwen-image:q4 "your prompt here"

mold pull qwen-image-2512:q4
mold run qwen-image-2512:q4 "your prompt here"

Apple Silicon

On Apple Silicon, leave --qwen2-variant unset first. Metal auto will prefer the quantized Qwen2.5-VL text encoder path for Qwen-Image automatically.

bash
mold run qwen-image:q2 "your prompt here" --preview

To compare explicitly:

bash
mold run qwen-image:q2 "your prompt here" --qwen2-variant q6
mold run qwen-image:q2 "your prompt here" --qwen2-variant q4

Defaults

  • Resolution: 1328x1328
  • Guidance: 4.0
  • Steps: 50

On the 24 GB validation machine used for mold development:

  • q2 through q6 were validated at 1024x1024
  • q8 was validated at 768x768
  • qwen-image-2512:q4 still ran out of memory at 1328x1328

Negative Prompts

Qwen-Image supports negative prompts via --negative-prompt.

For the GGUF quantized paths above, the best prompt adherence came from using no default negative prompt at all. Start without one and only add a negative prompt if you need to push the image away from a specific failure mode.

The upstream Chinese negative prompt is more appropriate for BF16 / FP8 paths:

bash
mold run qwen-image:fp8 "a cat" --negative-prompt "低分辨率,低画质,肢体畸形,手指畸形"

WARNING

The upstream Chinese negative prompt can hurt GGUF prompt adherence. Avoid using it by default with qwen-image:q2 through qwen-image:q8 or qwen-image-2512:q2 through qwen-image-2512:q8.

Other Qwen Variants

mold also exposes higher-VRAM Qwen paths such as qwen-image:bf16, qwen-image:fp8, qwen-image-lightning:fp8, and qwen-image-lightning:fp8-8step. Those are separate from the GGUF quantized matrix above and have different memory and scheduler behavior.

WidthHeightAspect Ratio
132813281:1 (native)
102410241:1
11528969:7
89611527:9
121683219:13
832121613:19
13447687:4
76813444:7
1664928~16:9
9281664~9:16
7687681:1 (small)
5125121:1 (small)

Using non-recommended dimensions will trigger a warning. All values must be multiples of 16.