Skip to content

Flux.2 Klein

A lightweight 4B parameter FLUX variant. Fast 4-step generation with low VRAM requirements.

Variants

ModelStepsSizeNotes
flux2-klein:q844.3 GBGood quality
flux2-klein:q643.4 GBBetter quality
flux2-klein:q442.6 GBSmallest FLUX
flux2-klein:bf1647.8 GBFull precision 4B

Defaults

  • Resolution: 1024x1024
  • Guidance: 1.0
  • Steps: 4

Flux.2 Klein-9B

A larger 9B parameter FLUX variant. Distilled for fast 4-step generation with higher quality than the 4B Klein. Uses a Qwen3-8B text encoder (hidden_size=4096) vs Klein-4B's Qwen3-4B (hidden_size=2560).

Variants

ModelStepsSizeNotes
flux2-klein-9b:q8410 GBGood quality
flux2-klein-9b:q647.9 GBBetter quality
flux2-klein-9b:q445.9 GBSmallest 9B
flux2-klein-9b:bf16418 GBFull precision, gated, 2 shards

Defaults

  • Resolution: 1024x1024
  • Guidance: 1.0
  • Steps: 4

Note: GGUF quantized variants (Q4/Q6/Q8) use ~6-10GB VRAM. The BF16 variant requires ~18GB VRAM, is gated on HuggingFace, and requires license acceptance before download.

WidthHeightAspect Ratio
102410241:1 (native)
10247684:3
76810243:4
102457616:9
57610249:16
7687681:1

Using non-recommended dimensions will trigger a warning. All values must be multiples of 16.

Example

Flux.2 Klein Q8 — 4 steps, seed 100:

bash
mold run flux2-klein:q8 \
  "A minimalist zen garden with raked sand patterns, \
  a single cherry blossom tree, morning mist" \
  --seed 100

Zen garden — Flux.2 Klein

Flux.2 Klein BF16 — 4 steps:

bash
mold run flux2-klein:bf16 \
  "a majestic owl perched on a mossy branch in a moonlit forest"

Owl — Flux.2 Klein BF16

Flux.2 Klein-9B Q4 — 4 steps, seed 999:

bash
mold run flux2-klein-9b:q4 \
  "A glass bottle ship inside a stormy ocean wave, \
  dramatic lightning, hyperrealistic macro photography" \
  --seed 999

Bottle ship — Flux.2 Klein-9B Q4

Architecture

Flux.2 Klein uses a Qwen3 text encoder (BF16 or GGUF, layers 9/18/27), a shared modulation transformer (BF16 or GGUF), and a BN-VAE decoder. Klein-4B uses Qwen3-4B (hidden_size=2560), Klein-9B uses Qwen3-8B (hidden_size=4096). GGUF variants keep weights quantized in VRAM with on-the-fly dequantization per matmul, minimizing memory usage.