Skip to content

Configuration

mold keeps configuration in two places by design:

  • config.toml — the hand-editable bootstrap file in ~/.mold/ (or $MOLD_HOME). Owns paths, ports, credentials, and the model-path entries that mold pull writes.
  • mold.db (SQLite) — owns user preferences: the [expand] section, global generation defaults (default_width, default_height, default_steps, embed_metadata, t5_variant, qwen3_variant, default_negative_prompt), the last-model sidecar, and per-model generation defaults (default_steps, default_guidance, default_width, default_height, scheduler, negative_prompt, lora, lora_scale). These fields moved to the DB in #265 so GUI writes and hand-curated TOML no longer fight over the same file.

Environment variables still override both surfaces at read time. Upgrading from an earlier release runs a one-shot import of the preference slices of config.toml into the DB on first launch — your existing values carry over.

Managing Config from the CLI

mold config routes writes to the right surface based on the key prefix:

bash
mold config list                       # All settings tagged [db] / [file] / [env]
mold config list --json                # JSON form: { "value": …, "surface": … } per key
mold config get server_port            # Get a value
mold config set server_port 8080       # Bootstrap key → writes config.toml
mold config set expand.enabled true    # User preference → writes mold.db
mold config set default_width 1024     # Generation default → writes mold.db
mold config where expand.enabled       # Print which surface owns this key
mold config reset expand.enabled       # Drop the DB row; next read falls back to TOML/env/default
mold config reset --all --yes          # Drop every DB row under the active profile
mold config --profile portrait list    # Scope a command to an explicit profile (v6)
mold config edit                       # Open config.toml in $EDITOR

mold config list tags every row with its surface so you can see at a glance which store owns each key — [db] for mold.db, [file] for config.toml, [env] when a MOLD_* env var is currently overriding. mold config set prints the same tag in its output (for example Set expand.enabled = true [db]). mold config where <key> also reports any env override that beats both stores at runtime.

mold config reset <key> drops the DB row so the next read falls back to the TOML/env/compiled default — useful for "undo the wrong setting" without hand-editing mold.db. TOML-only keys are rejected with a pointer at mold config set since those live in the hand-edited file. mold config reset --all purges every DB row under the active profile (prompts for confirmation unless --yes is passed).

Multi-profile (schema v6)

settings and model_prefs rows are keyed on (profile, key) / (profile, model) — one DB can host multiple independent preference sets (default, dev, portrait, …). Active profile resolves in priority order:

  1. MOLD_PROFILE env var,
  2. the profile.active setting row under the default profile,
  3. "default".

Every mold config subcommand accepts --profile <name> to scope for a single invocation without touching the env or the meta setting.

See the CLI Reference for the full list of keys and options.

Config File

toml
default_model = "flux2-klein:q8"
models_dir = "~/.mold/models"
server_port = 7680
default_width = 1024
default_height = 1024

# Global default negative prompt (CFG models only)
# default_negative_prompt = "low quality, worst quality, blurry, watermark"

[models."flux-dev:bf16"]
default_steps = 25
default_guidance = 3.5
# lora = "/path/to/adapter.safetensors"
# lora_scale = 0.8

[models."sd15:fp16"]
default_steps = 25
default_guidance = 7.5
negative_prompt = "worst quality, low quality, bad anatomy"

[expand]
enabled = false
backend = "local"
model = "qwen3-expand:q8"
temperature = 0.7

# Per-family expansion tuning
# [expand.families.sd15]
# word_limit = 50
# style_notes = "Short keyword phrases for CLIP-L."

# [expand.families.flux]
# word_limit = 200
# style_notes = "Rich natural language descriptions."

[logging]
# level = "info"              # Log level (overridden by MOLD_LOG env var)
# file = false                # Enable file logging to ~/.mold/logs/
# dir = "~/.mold/logs"        # Custom log directory
# max_days = 7                # Days to retain rotated log files

[lambda]
# api_key = "..."             # Prefer LAMBDA_API_KEY for shells
# endpoint = "https://cloud.lambda.ai/api/v1"
# image_repository = "ghcr.io/utensils/mold"
# ssh_key_name = "mold-laptop"
# ssh_private_key_path = "~/.ssh/id_ed25519"
# filesystem_prefix = "mold"
# filesystem_mount_path = "/data/mold"
# confirm_hourly_usd = 5.0
# local_port = 7680

Environment Variables

Environment variables take precedence over config file values.

Core

VariableDefaultDescription
MOLD_HOME~/.moldBase directory for config and cache
MOLD_DEFAULT_MODELflux2-klein:q8Default model name
MOLD_HOSThttp://localhost:7680Remote server URL
MOLD_MODELS_DIR$MOLD_HOME/modelsModel storage directory
MOLD_PORT7680Server port
LAMBDA_API_KEYunsetOverrides lambda.api_key
MOLD_LOGinfo (serve) / warn (cli, tui)Log level

Generation

VariableDefaultDescription
MOLD_EAGER1 to keep all components loaded
MOLD_OFFLOAD1 to force CPU↔GPU block streaming for FLUX, Flux.2, Z-Image, and SD3 BF16 paths
MOLD_OFFLOAD_PREFETCHonFLUX offload async H2D prefetch stream — set off to revert to synchronous
MOLD_PINNED_VRAM_MAX_GBRAM × 0.5Cap on pinned host memory used by the FLUX offload path
MOLD_RESERVE_VRAM_MB400 (Linux) / 600 (Windows) / 0 (macOS)OS / cuBLAS workspace reserve subtracted from free_vram_bytes before any budget decision. Set explicitly to override the platform default; 0 disables
MOLD_KEEP_TE_RAM1 to park text encoders on CPU between requests. FP16/BF16 only (GGUF falls through to drop+reload). Disabled on Metal (unified memory).
MOLD_LORA_BYPASSautoFLUX LoRA application path: auto enables bypass-mode when LoRAs are present (covers offload AND the GGUF/quantized path via quantized_transformer.rs), on always bypasses, off reverts to legacy merge-into-base / gguf_lora_var_builder
MOLD_VAE_TILEDautoTiled VAE decode for FLUX/FLUX2/SDXL/SD3: auto retries with tiling on OOM, force always tiles, off disables
MOLD_LONG_PROMPTS1 enables ComfyUI-style chunked CLIP encoding (75-token windows, BOS/EOS framing, pooled outputs averaged into the FLUX vector_in 768-dim conditioning). Default off — pre-Tier-2 hard truncation at 77 preserved.
MOLD_ATTNmathAttention backend: math (default) or flash (needs --features cuda,flash-attn AND RUSTFLAGS='--cfg mold_flash_attn_real'; falls back to math with a one-shot warning otherwise)
MOLD_ATTN_CHUNKautoOverride math-attention query chunk size. Positive integers below the sequence length enable chunking; 0 or off disables it. The CUDA default chunks long queries at 512 to reduce peak VRAM.
MOLD_EMBED_METADATA10 to disable PNG metadata
MOLD_MEDIA_ROOTSPlatform path-list of allow roots for trusted server-local LTX-2 audio_file_path / source_video_path requests. Targets are canonicalized and must resolve to files under one configured root.
MOLD_PREVIEW1 to display images inline in terminal
MOLD_T5_VARIANTautoT5 encoder: auto/fp16/q8/q6/q5/q4/q3
MOLD_QWEN3_VARIANTautoQwen3 encoder: auto/bf16/q8/q6/iq4/q3
MOLD_SCHEDULERSD1.5/SDXL: ddim/euler-ancestral/uni-pc
MOLD_CFG_PLUS1 to enable CFG++ (manifold-projection guidance, Chung et al. 2024). Drops usable CFG to ~1.5–2.5 and removes guidance artifacts. Per-request --cfg-plus overrides. Supported on SD3, SDXL, and SD1.5 (DDIM only — Euler-A / UniPC fall back). Ignored by FLUX / Z-Image / Flux.2 (distilled).
MOLD_VAE_DTYPEautoOverride VAE precision: auto, bf16, fp16, fp32. Use fp32 to fix banding artifacts on FLUX/SD3 finetuned VAEs (~2× decode VRAM; tiled VAE absorbs OOM via existing fallback). Wired into FLUX, FLUX2, SD3, SDXL, SD1.5.
MOLD_LTX2_GEMMA_DEVICEautoLTX-2 Gemma 3 12B prompt encoder placement: auto (active GPU → sibling GPU → CPU on free VRAM, threshold 24 GB), cpu (force system RAM, slower encode but no VRAM contention — required on a single 24 GB card running cv:2752735), gpu (pin to active GPU, surface OOM rather than auto-offload). The deprecated MOLD_LTX2_DEBUG_FORCE_CPU_PROMPT_ENCODER=1 is a one-shot-warn alias. Server preflight uses the same resolver so the load admits/rejects in lockstep with what the runtime will do.
MOLD_LTX2_GEMMA_VARIANTautoLTX-2 Gemma 3 12B weight format: auto (BF16 if both formats present, GGUF if only GGUF), q4 (force Q4 GGUF — google/gemma-3-12b-it-qat-q4_0-gguf, ~7 GB; fits on a 24 GB card alongside the streaming transformer so auto-placement keeps Gemma GPU-resident), bf16 (force BF16 split — google/gemma-3-12b-it-qat-q4_0-unquantized, ~23 GB; historical default). For V1, place the Q4 GGUF in your gemma_root manually — manifest auto-fetch is deferred to a follow-up.

Prompt Expansion

VariableDefaultDescription
MOLD_EXPAND1 to enable expansion
MOLD_EXPAND_BACKENDlocallocal or OpenAI-compatible URL
MOLD_EXPAND_MODELqwen3-expand:q8LLM model for expansion
MOLD_EXPAND_TEMPERATURE0.7Sampling temperature
MOLD_EXPAND_THINKING1 to enable thinking mode
MOLD_EXPAND_SYSTEM_PROMPTCustom system prompt template
MOLD_EXPAND_BATCH_PROMPTCustom batch prompt template

Server

VariableDefaultDescription
MOLD_GPUSall visibleWhich GPUs the server uses: comma-separated ordinals (0,1) or all. See Multi-GPU
MOLD_QUEUE_SIZE200Max queued generation jobs; overflow returns HTTP 503 with Retry-After
MOLD_OUTPUT_DIR~/.mold/outputImage output directory (set empty to disable)
MOLD_THUMBNAIL_WARMUP1 to prebuild gallery thumbnails at server startup (default: disabled)
MOLD_WEB_DIROverride the web gallery SPA bundle location. First resolved path among this, $XDG_DATA_HOME/mold/web, ~/.mold/web, <binary dir>/web, and ./web/dist wins
MOLD_DB_PATHMOLD_HOME/mold.dbOverride the SQLite gallery metadata DB location
MOLD_DB_DISABLE1 to disable the SQLite metadata DB entirely — server and CLI fall back to filesystem walks
MOLD_CORS_ORIGINRestrict CORS to specific origin
MOLD_API_KEYAPI key for authentication (single key, comma-separated, or @/path/to/keys.txt)
MOLD_RATE_LIMITPer-IP rate limit for generation endpoints (e.g., 10/min, 5/sec, 100/hour)
MOLD_RATE_LIMIT_BURSTBurst allowance override (defaults to 2x rate, capped at 100)
MOLD_MAX_CACHED_MODELS3LRU model-cache capacity (range 1..=16). At most one entry stays GPU-resident; the rest are parked in CPU RAM. Out-of-range values warn and fall back to default.
MOLD_CACHE_IDLE_TTL_SECS1800 (30 min)Idle timeout for parked cache entries (range 60..=86400). Untouched entries are evicted past this TTL.
MOLD_QUEUE_LOOKAHEAD_BUFFER8Server queue lookahead size (range 1..=64). The dispatcher peeks this many jobs ahead to honour locality.
MOLD_QUEUE_MAX_DEFERRALS3Per-job starvation budget (range 0..=32). A job can be deferred this many times before forced pickup.
MOLD_MALLOC_TRIM1 (Linux/glibc)0 disables the post-generation malloc_trim(0) call. Cheap (~ms) but Linux-only; reclaims arena pages after large GGUF+LoRA rebuilds.
MOLD_FLUX_DELTA_CACHE10 disables the CPU-side FLUX LoRA delta cache (~25 GB host RAM on typical FLUX LoRAs). Disabling forces a sub-second B@A·scale recompute on each rebuild.
MOLD_FLUX_KEEP_TRANSFORMER01 keeps the FLUX transformer GPU-resident across same-LoRA generations (saves a full GGUF+LoRA rebuild). Server force-drops it if VAE decode headroom is too tight at that resolution.

Upscaling

VariableDefaultDescription
MOLD_UPSCALE_MODELDefault upscaler model for mold upscale
MOLD_UPSCALE_TILE_SIZETile size for memory-efficient upscaling (0 to disable tiling)

Auth

VariableDefaultDescription
HF_TOKENHuggingFace token for gated models

mold persists generation metadata in a SQLite database at MOLD_HOME/mold.db (override with MOLD_DB_PATH). Both surfaces — the CLI's local generation path and the HTTP server — write a row per saved file: prompt, negative prompt, model, seed, steps, guidance, dimensions, LoRA, scheduler, the file's mtime/size, the generation duration, and a source column (server / cli / backfill).

The DB also stores the full generation metadata JSON for rows written by current versions, so gallery clients can recreate outputs with advanced options such as LoRA stacks, ControlNet settings, CFG++, output format, and LTX-2 audio/video pipeline controls.

The DB powers /api/gallery so listings stay fast on large directories (no per-request file walk) and surface metadata for formats that don't embed it (mp4, gif, webp). PNG / JPEG outputs still get the existing embedded mold:parameters chunk in addition to the row.

On server startup the DB runs an asynchronous reconciliation pass:

  • new files in MOLD_OUTPUT_DIR get rows added (synthesizing metadata from the filename when no embedded chunk is present)
  • rows whose backing files have been removed (manual rm, file manager, etc.) get pruned
  • size/mtime changes trigger a row refresh

Set MOLD_DB_DISABLE=1 to opt out — both surfaces fall back to the filesystem walk + embedded-metadata behavior from before. The NixOS module exposes the same toggle:

nix
services.mold = {
  enable = true;
  metadataDb.enable = false;          # opt out
  # metadataDb.path = "/var/lib/mold/custom.db";   # override location
};

Advanced

Device and Path Overrides

VariableDefaultDescription
MOLD_DEVICEForce device placement, currently cpu for debugging
MOLD_TRANSFORMER_PATHOverride transformer weights path
MOLD_VAE_PATHOverride VAE weights path
MOLD_SPATIAL_UPSCALER_PATHOverride LTX spatial upscaler path
MOLD_TEMPORAL_UPSCALER_PATHOverride LTX temporal upscaler path
MOLD_DISTILLED_LORA_PATHOverride the default LTX-2 distilled LoRA path
MOLD_T5_PATHOverride T5 encoder path
MOLD_CLIP_PATHOverride CLIP-L encoder path
MOLD_CLIP2_PATHOverride CLIP-G encoder path for SDXL
MOLD_T5_TOKENIZER_PATHOverride T5 tokenizer path
MOLD_CLIP_TOKENIZER_PATHOverride CLIP-L tokenizer path
MOLD_CLIP2_TOKENIZER_PATHOverride CLIP-G tokenizer path for SDXL
MOLD_TEXT_TOKENIZER_PATHOverride generic text tokenizer path for Qwen/Z-Image
MOLD_DECODER_PATHOverride Wuerstchen decoder weights path
MOLD_QWEN2_VARIANTautoQwen-family Qwen2.5-VL encoder: auto, bf16, q8, q6, q5, q4, q3, q2
MOLD_QWEN2_TEXT_ENCODER_MODEautoQwen-family placement mode: auto, gpu, cpu-stage, cpu

These are mainly useful for custom local model layouts, manual debugging, or testing alternative weight files without editing config.toml.

Per-component device placement

Override which device (CPU or a specific GPU) runs each part of the diffusion pipeline. All variables accept the same four forms: auto (preserve the engine's VRAM-aware default), cpu, gpu (= gpu:0), or gpu:N for a specific ordinal.

VariableApplies toNotes
MOLD_PLACE_TEXT_ENCODERSEvery model family (Tier 1)Single knob that moves every text encoder slot as a group. Picking cpu frees the transformer's full VRAM budget without triggering block offload.
MOLD_PLACE_TRANSFORMERFLUX, Flux.2, Z-Image, Qwen-ImagePer-component override. Interacts with MOLD_OFFLOAD — blocks still stream from CPU but target the chosen ordinal.
MOLD_PLACE_VAEFLUX, Flux.2, Z-Image, Qwen-ImageDecode stage; CPU is fine for preview, GPU is faster.
MOLD_PLACE_T5FLUXPer-encoder override; unset falls through to MOLD_PLACE_TEXT_ENCODERS.
MOLD_PLACE_CLIP_LFLUXPer-encoder override.
MOLD_PLACE_CLIP_GSDXL and others that use CLIP-GPer-encoder override.
MOLD_PLACE_QWENFlux.2, Z-Image, Qwen-ImagePer-encoder override for the Qwen text encoder.

Precedence (highest wins): CLI flag (--device-text-encoders, --device-vae, …) → env var → [models."name:tag".placement] TOML block → engine auto.

The web UI's Placement panel, the PUT /api/config/model/:name/placement route, and mold run --device-* flags all write/read the same shape, so any surface can drive it.

Tier 2 per-component controls are intentionally gated: families other than FLUX, Flux.2, Z-Image, and Qwen-Image only honor Tier 1 (MOLD_PLACE_TEXT_ENCODERS) — their engines don't yet split encoder/transformer/VAE across devices. Setting the advanced variables on a Tier 1-only family is a no-op (the web UI hides the Advanced disclosure for those families so it isn't misleading).

For Qwen-Image and Qwen-Image-Edit:

  • CUDA auto prefers BF16 when enough text-encoder headroom remains, and falls back to quantized GGUF variants for local sequential, resident, and edit-conditioning paths when BF16 would be too heavy.
  • Metal/MPS auto prefers the quantized Qwen2.5-VL GGUF encoder path to reduce memory pressure during prompt encoding.
  • qwen-image-edit still loads the Qwen2.5-VL vision tower for image conditioning, but quantized MOLD_QWEN2_VARIANT values keep the language side smaller and stage the vision weights only when needed.

Debug and Family-Specific Knobs

VariableDefaultDescription
MOLD_SD3_DEBUGEnable verbose SD3.5 pipeline logging
MOLD_QWEN_DEBUGEnable verbose Qwen-Image pipeline logging
MOLD_ZIMAGE_DEBUGEnable verbose Z-Image pipeline logging
MOLD_LTX_DEBUGEnable verbose LTX Video / LTX-2 pipeline logging
MOLD_LTX_DEBUG_FILE/tmp/mold-ltx2-debug.logAppend LTX Video / LTX-2 debug output to a file
MOLD_LTX_DEBUG_COMPARE_UNCONDLog conditional vs unconditional LTX-2 prompt-context comparisons
MOLD_LTX_DEBUG_ALT_PROMPTUse an alternate prompt string for LTX-2 prompt-sensitivity debugging
MOLD_LTX_DEBUG_DISABLE_AUDIO_BRANCHDebug-only LTX-2 switch to disable the audio branch during native runs
MOLD_LTX_DEBUG_DISABLE_CROSS_ATTENTION_ADALNDebug-only LTX-2 switch to bypass cross-attention AdaLN modulation
MOLD_LTX2_DEBUG_DISABLE_TRANSFORMER_GATED_ATTENTIONDebug-only LTX-2 switch to bypass transformer gated attention
MOLD_LTX2_DEBUG_FORCE_CPU_PROMPT_ENCODERDeprecated alias for MOLD_LTX2_GEMMA_DEVICE=cpu. Emits a one-shot warn at runtime; remove in favor of the new knob.
MOLD_LTX2_DEBUG_TIMINGSEmit native LTX-2 pipeline, phase, and denoise timing summaries for optimization work
MOLD_LTX2_DEBUG_STAGE_PREFIXWrite decoded native LTX-2 stage artifacts using this filename prefix
MOLD_LTX2_DEBUG_BLOCKSEmit per-block native LTX-2 transformer debug logs
MOLD_LTX2_DEBUG_BLOCK_DETAILRestrict detailed native LTX-2 block logging to a specific transformer block index
MOLD_LTX2_DEBUG_LOAD_BLOCKSLog native LTX-2 transformer block loading details
MOLD_LTX2_FORCE_EAGERForce eager native LTX-2 transformer loading instead of layer streaming
MOLD_LTX2_FORCE_STREAMINGForce native LTX-2 transformer layer streaming
MOLD_LTX2_FP8_INPUT_SCALE_MODEskipDebug override for native LTX-2 FP8 input-scale handling (skip, emulate, divide, multiply)
MOLD_LTX2_FP8_WEIGHT_SCALE_MODEapplyDebug override for native LTX-2 FP8 checkpoint weight-scale handling (apply, skip, scaled-mm)
MOLD_WUERSTCHEN_DEBUGEnable verbose Wuerstchen pipeline logging
MOLD_WUERSTCHEN_DECODER_GUIDANCE0.0Override decoder-stage CFG guidance for Wuerstchen

These are intended for troubleshooting and development rather than normal use.

Build-Time Metadata

VariableDefaultDescription
MOLD_FULL_VERSIONInternal build-time version string embedded into CLI output

This variable is set during the build and is not normally configured by users at runtime.