Configuration
mold keeps configuration in two places by design:
config.toml— the hand-editable bootstrap file in~/.mold/(or$MOLD_HOME). Owns paths, ports, credentials, and the model-path entries thatmold pullwrites.mold.db(SQLite) — owns user preferences: the[expand]section, global generation defaults (default_width,default_height,default_steps,embed_metadata,t5_variant,qwen3_variant,default_negative_prompt), thelast-modelsidecar, and per-model generation defaults (default_steps,default_guidance,default_width,default_height,scheduler,negative_prompt,lora,lora_scale). These fields moved to the DB in #265 so GUI writes and hand-curated TOML no longer fight over the same file.
Environment variables still override both surfaces at read time. Upgrading from an earlier release runs a one-shot import of the preference slices of config.toml into the DB on first launch — your existing values carry over.
Managing Config from the CLI
mold config routes writes to the right surface based on the key prefix:
mold config list # All settings tagged [db] / [file] / [env]
mold config list --json # JSON form: { "value": …, "surface": … } per key
mold config get server_port # Get a value
mold config set server_port 8080 # Bootstrap key → writes config.toml
mold config set expand.enabled true # User preference → writes mold.db
mold config set default_width 1024 # Generation default → writes mold.db
mold config where expand.enabled # Print which surface owns this key
mold config reset expand.enabled # Drop the DB row; next read falls back to TOML/env/default
mold config reset --all --yes # Drop every DB row under the active profile
mold config --profile portrait list # Scope a command to an explicit profile (v6)
mold config edit # Open config.toml in $EDITORmold config list tags every row with its surface so you can see at a glance which store owns each key — [db] for mold.db, [file] for config.toml, [env] when a MOLD_* env var is currently overriding. mold config set prints the same tag in its output (for example Set expand.enabled = true [db]). mold config where <key> also reports any env override that beats both stores at runtime.
mold config reset <key> drops the DB row so the next read falls back to the TOML/env/compiled default — useful for "undo the wrong setting" without hand-editing mold.db. TOML-only keys are rejected with a pointer at mold config set since those live in the hand-edited file. mold config reset --all purges every DB row under the active profile (prompts for confirmation unless --yes is passed).
Multi-profile (schema v6)
settings and model_prefs rows are keyed on (profile, key) / (profile, model) — one DB can host multiple independent preference sets (default, dev, portrait, …). Active profile resolves in priority order:
MOLD_PROFILEenv var,- the
profile.activesetting row under thedefaultprofile, "default".
Every mold config subcommand accepts --profile <name> to scope for a single invocation without touching the env or the meta setting.
See the CLI Reference for the full list of keys and options.
Config File
default_model = "flux2-klein:q8"
models_dir = "~/.mold/models"
server_port = 7680
default_width = 1024
default_height = 1024
# Global default negative prompt (CFG models only)
# default_negative_prompt = "low quality, worst quality, blurry, watermark"
[models."flux-dev:bf16"]
default_steps = 25
default_guidance = 3.5
# lora = "/path/to/adapter.safetensors"
# lora_scale = 0.8
[models."sd15:fp16"]
default_steps = 25
default_guidance = 7.5
negative_prompt = "worst quality, low quality, bad anatomy"
[expand]
enabled = false
backend = "local"
model = "qwen3-expand:q8"
temperature = 0.7
# Per-family expansion tuning
# [expand.families.sd15]
# word_limit = 50
# style_notes = "Short keyword phrases for CLIP-L."
# [expand.families.flux]
# word_limit = 200
# style_notes = "Rich natural language descriptions."
[logging]
# level = "info" # Log level (overridden by MOLD_LOG env var)
# file = false # Enable file logging to ~/.mold/logs/
# dir = "~/.mold/logs" # Custom log directory
# max_days = 7 # Days to retain rotated log files
[lambda]
# api_key = "..." # Prefer LAMBDA_API_KEY for shells
# endpoint = "https://cloud.lambda.ai/api/v1"
# image_repository = "ghcr.io/utensils/mold"
# ssh_key_name = "mold-laptop"
# ssh_private_key_path = "~/.ssh/id_ed25519"
# filesystem_prefix = "mold"
# filesystem_mount_path = "/data/mold"
# confirm_hourly_usd = 5.0
# local_port = 7680Environment Variables
Environment variables take precedence over config file values.
Core
| Variable | Default | Description |
|---|---|---|
MOLD_HOME | ~/.mold | Base directory for config and cache |
MOLD_DEFAULT_MODEL | flux2-klein:q8 | Default model name |
MOLD_HOST | http://localhost:7680 | Remote server URL |
MOLD_MODELS_DIR | $MOLD_HOME/models | Model storage directory |
MOLD_PORT | 7680 | Server port |
LAMBDA_API_KEY | unset | Overrides lambda.api_key |
MOLD_LOG | info (serve) / warn (cli, tui) | Log level |
Generation
| Variable | Default | Description |
|---|---|---|
MOLD_EAGER | — | 1 to keep all components loaded |
MOLD_OFFLOAD | — | 1 to force CPU↔GPU block streaming for FLUX, Flux.2, Z-Image, and SD3 BF16 paths |
MOLD_OFFLOAD_PREFETCH | on | FLUX offload async H2D prefetch stream — set off to revert to synchronous |
MOLD_PINNED_VRAM_MAX_GB | RAM × 0.5 | Cap on pinned host memory used by the FLUX offload path |
MOLD_RESERVE_VRAM_MB | 400 (Linux) / 600 (Windows) / 0 (macOS) | OS / cuBLAS workspace reserve subtracted from free_vram_bytes before any budget decision. Set explicitly to override the platform default; 0 disables |
MOLD_KEEP_TE_RAM | — | 1 to park text encoders on CPU between requests. FP16/BF16 only (GGUF falls through to drop+reload). Disabled on Metal (unified memory). |
MOLD_LORA_BYPASS | auto | FLUX LoRA application path: auto enables bypass-mode when LoRAs are present (covers offload AND the GGUF/quantized path via quantized_transformer.rs), on always bypasses, off reverts to legacy merge-into-base / gguf_lora_var_builder |
MOLD_VAE_TILED | auto | Tiled VAE decode for FLUX/FLUX2/SDXL/SD3: auto retries with tiling on OOM, force always tiles, off disables |
MOLD_LONG_PROMPTS | — | 1 enables ComfyUI-style chunked CLIP encoding (75-token windows, BOS/EOS framing, pooled outputs averaged into the FLUX vector_in 768-dim conditioning). Default off — pre-Tier-2 hard truncation at 77 preserved. |
MOLD_ATTN | math | Attention backend: math (default) or flash (needs --features cuda,flash-attn AND RUSTFLAGS='--cfg mold_flash_attn_real'; falls back to math with a one-shot warning otherwise) |
MOLD_ATTN_CHUNK | auto | Override math-attention query chunk size. Positive integers below the sequence length enable chunking; 0 or off disables it. The CUDA default chunks long queries at 512 to reduce peak VRAM. |
MOLD_EMBED_METADATA | 1 | 0 to disable PNG metadata |
MOLD_MEDIA_ROOTS | — | Platform path-list of allow roots for trusted server-local LTX-2 audio_file_path / source_video_path requests. Targets are canonicalized and must resolve to files under one configured root. |
MOLD_PREVIEW | — | 1 to display images inline in terminal |
MOLD_T5_VARIANT | auto | T5 encoder: auto/fp16/q8/q6/q5/q4/q3 |
MOLD_QWEN3_VARIANT | auto | Qwen3 encoder: auto/bf16/q8/q6/iq4/q3 |
MOLD_SCHEDULER | — | SD1.5/SDXL: ddim/euler-ancestral/uni-pc |
MOLD_CFG_PLUS | — | 1 to enable CFG++ (manifold-projection guidance, Chung et al. 2024). Drops usable CFG to ~1.5–2.5 and removes guidance artifacts. Per-request --cfg-plus overrides. Supported on SD3, SDXL, and SD1.5 (DDIM only — Euler-A / UniPC fall back). Ignored by FLUX / Z-Image / Flux.2 (distilled). |
MOLD_VAE_DTYPE | auto | Override VAE precision: auto, bf16, fp16, fp32. Use fp32 to fix banding artifacts on FLUX/SD3 finetuned VAEs (~2× decode VRAM; tiled VAE absorbs OOM via existing fallback). Wired into FLUX, FLUX2, SD3, SDXL, SD1.5. |
MOLD_LTX2_GEMMA_DEVICE | auto | LTX-2 Gemma 3 12B prompt encoder placement: auto (active GPU → sibling GPU → CPU on free VRAM, threshold 24 GB), cpu (force system RAM, slower encode but no VRAM contention — required on a single 24 GB card running cv:2752735), gpu (pin to active GPU, surface OOM rather than auto-offload). The deprecated MOLD_LTX2_DEBUG_FORCE_CPU_PROMPT_ENCODER=1 is a one-shot-warn alias. Server preflight uses the same resolver so the load admits/rejects in lockstep with what the runtime will do. |
MOLD_LTX2_GEMMA_VARIANT | auto | LTX-2 Gemma 3 12B weight format: auto (BF16 if both formats present, GGUF if only GGUF), q4 (force Q4 GGUF — google/gemma-3-12b-it-qat-q4_0-gguf, ~7 GB; fits on a 24 GB card alongside the streaming transformer so auto-placement keeps Gemma GPU-resident), bf16 (force BF16 split — google/gemma-3-12b-it-qat-q4_0-unquantized, ~23 GB; historical default). For V1, place the Q4 GGUF in your gemma_root manually — manifest auto-fetch is deferred to a follow-up. |
Prompt Expansion
| Variable | Default | Description |
|---|---|---|
MOLD_EXPAND | — | 1 to enable expansion |
MOLD_EXPAND_BACKEND | local | local or OpenAI-compatible URL |
MOLD_EXPAND_MODEL | qwen3-expand:q8 | LLM model for expansion |
MOLD_EXPAND_TEMPERATURE | 0.7 | Sampling temperature |
MOLD_EXPAND_THINKING | — | 1 to enable thinking mode |
MOLD_EXPAND_SYSTEM_PROMPT | — | Custom system prompt template |
MOLD_EXPAND_BATCH_PROMPT | — | Custom batch prompt template |
Server
| Variable | Default | Description |
|---|---|---|
MOLD_GPUS | all visible | Which GPUs the server uses: comma-separated ordinals (0,1) or all. See Multi-GPU |
MOLD_QUEUE_SIZE | 200 | Max queued generation jobs; overflow returns HTTP 503 with Retry-After |
MOLD_OUTPUT_DIR | ~/.mold/output | Image output directory (set empty to disable) |
MOLD_THUMBNAIL_WARMUP | — | 1 to prebuild gallery thumbnails at server startup (default: disabled) |
MOLD_WEB_DIR | — | Override the web gallery SPA bundle location. First resolved path among this, $XDG_DATA_HOME/mold/web, ~/.mold/web, <binary dir>/web, and ./web/dist wins |
MOLD_DB_PATH | MOLD_HOME/mold.db | Override the SQLite gallery metadata DB location |
MOLD_DB_DISABLE | — | 1 to disable the SQLite metadata DB entirely — server and CLI fall back to filesystem walks |
MOLD_CORS_ORIGIN | — | Restrict CORS to specific origin |
MOLD_API_KEY | — | API key for authentication (single key, comma-separated, or @/path/to/keys.txt) |
MOLD_RATE_LIMIT | — | Per-IP rate limit for generation endpoints (e.g., 10/min, 5/sec, 100/hour) |
MOLD_RATE_LIMIT_BURST | — | Burst allowance override (defaults to 2x rate, capped at 100) |
MOLD_MAX_CACHED_MODELS | 3 | LRU model-cache capacity (range 1..=16). At most one entry stays GPU-resident; the rest are parked in CPU RAM. Out-of-range values warn and fall back to default. |
MOLD_CACHE_IDLE_TTL_SECS | 1800 (30 min) | Idle timeout for parked cache entries (range 60..=86400). Untouched entries are evicted past this TTL. |
MOLD_QUEUE_LOOKAHEAD_BUFFER | 8 | Server queue lookahead size (range 1..=64). The dispatcher peeks this many jobs ahead to honour locality. |
MOLD_QUEUE_MAX_DEFERRALS | 3 | Per-job starvation budget (range 0..=32). A job can be deferred this many times before forced pickup. |
MOLD_MALLOC_TRIM | 1 (Linux/glibc) | 0 disables the post-generation malloc_trim(0) call. Cheap (~ms) but Linux-only; reclaims arena pages after large GGUF+LoRA rebuilds. |
MOLD_FLUX_DELTA_CACHE | 1 | 0 disables the CPU-side FLUX LoRA delta cache (~25 GB host RAM on typical FLUX LoRAs). Disabling forces a sub-second B@A·scale recompute on each rebuild. |
MOLD_FLUX_KEEP_TRANSFORMER | 0 | 1 keeps the FLUX transformer GPU-resident across same-LoRA generations (saves a full GGUF+LoRA rebuild). Server force-drops it if VAE decode headroom is too tight at that resolution. |
Upscaling
| Variable | Default | Description |
|---|---|---|
MOLD_UPSCALE_MODEL | — | Default upscaler model for mold upscale |
MOLD_UPSCALE_TILE_SIZE | — | Tile size for memory-efficient upscaling (0 to disable tiling) |
Auth
| Variable | Default | Description |
|---|---|---|
HF_TOKEN | — | HuggingFace token for gated models |
Gallery Metadata Database
mold persists generation metadata in a SQLite database at MOLD_HOME/mold.db (override with MOLD_DB_PATH). Both surfaces — the CLI's local generation path and the HTTP server — write a row per saved file: prompt, negative prompt, model, seed, steps, guidance, dimensions, LoRA, scheduler, the file's mtime/size, the generation duration, and a source column (server / cli / backfill).
The DB also stores the full generation metadata JSON for rows written by current versions, so gallery clients can recreate outputs with advanced options such as LoRA stacks, ControlNet settings, CFG++, output format, and LTX-2 audio/video pipeline controls.
The DB powers /api/gallery so listings stay fast on large directories (no per-request file walk) and surface metadata for formats that don't embed it (mp4, gif, webp). PNG / JPEG outputs still get the existing embedded mold:parameters chunk in addition to the row.
On server startup the DB runs an asynchronous reconciliation pass:
- new files in
MOLD_OUTPUT_DIRget rows added (synthesizing metadata from the filename when no embedded chunk is present) - rows whose backing files have been removed (manual
rm, file manager, etc.) get pruned - size/mtime changes trigger a row refresh
Set MOLD_DB_DISABLE=1 to opt out — both surfaces fall back to the filesystem walk + embedded-metadata behavior from before. The NixOS module exposes the same toggle:
services.mold = {
enable = true;
metadataDb.enable = false; # opt out
# metadataDb.path = "/var/lib/mold/custom.db"; # override location
};Advanced
Device and Path Overrides
| Variable | Default | Description |
|---|---|---|
MOLD_DEVICE | — | Force device placement, currently cpu for debugging |
MOLD_TRANSFORMER_PATH | — | Override transformer weights path |
MOLD_VAE_PATH | — | Override VAE weights path |
MOLD_SPATIAL_UPSCALER_PATH | — | Override LTX spatial upscaler path |
MOLD_TEMPORAL_UPSCALER_PATH | — | Override LTX temporal upscaler path |
MOLD_DISTILLED_LORA_PATH | — | Override the default LTX-2 distilled LoRA path |
MOLD_T5_PATH | — | Override T5 encoder path |
MOLD_CLIP_PATH | — | Override CLIP-L encoder path |
MOLD_CLIP2_PATH | — | Override CLIP-G encoder path for SDXL |
MOLD_T5_TOKENIZER_PATH | — | Override T5 tokenizer path |
MOLD_CLIP_TOKENIZER_PATH | — | Override CLIP-L tokenizer path |
MOLD_CLIP2_TOKENIZER_PATH | — | Override CLIP-G tokenizer path for SDXL |
MOLD_TEXT_TOKENIZER_PATH | — | Override generic text tokenizer path for Qwen/Z-Image |
MOLD_DECODER_PATH | — | Override Wuerstchen decoder weights path |
MOLD_QWEN2_VARIANT | auto | Qwen-family Qwen2.5-VL encoder: auto, bf16, q8, q6, q5, q4, q3, q2 |
MOLD_QWEN2_TEXT_ENCODER_MODE | auto | Qwen-family placement mode: auto, gpu, cpu-stage, cpu |
These are mainly useful for custom local model layouts, manual debugging, or testing alternative weight files without editing config.toml.
Per-component device placement
Override which device (CPU or a specific GPU) runs each part of the diffusion pipeline. All variables accept the same four forms: auto (preserve the engine's VRAM-aware default), cpu, gpu (= gpu:0), or gpu:N for a specific ordinal.
| Variable | Applies to | Notes |
|---|---|---|
MOLD_PLACE_TEXT_ENCODERS | Every model family (Tier 1) | Single knob that moves every text encoder slot as a group. Picking cpu frees the transformer's full VRAM budget without triggering block offload. |
MOLD_PLACE_TRANSFORMER | FLUX, Flux.2, Z-Image, Qwen-Image | Per-component override. Interacts with MOLD_OFFLOAD — blocks still stream from CPU but target the chosen ordinal. |
MOLD_PLACE_VAE | FLUX, Flux.2, Z-Image, Qwen-Image | Decode stage; CPU is fine for preview, GPU is faster. |
MOLD_PLACE_T5 | FLUX | Per-encoder override; unset falls through to MOLD_PLACE_TEXT_ENCODERS. |
MOLD_PLACE_CLIP_L | FLUX | Per-encoder override. |
MOLD_PLACE_CLIP_G | SDXL and others that use CLIP-G | Per-encoder override. |
MOLD_PLACE_QWEN | Flux.2, Z-Image, Qwen-Image | Per-encoder override for the Qwen text encoder. |
Precedence (highest wins): CLI flag (--device-text-encoders, --device-vae, …) → env var → [models."name:tag".placement] TOML block → engine auto.
The web UI's Placement panel, the PUT /api/config/model/:name/placement route, and mold run --device-* flags all write/read the same shape, so any surface can drive it.
Tier 2 per-component controls are intentionally gated: families other than FLUX, Flux.2, Z-Image, and Qwen-Image only honor Tier 1 (MOLD_PLACE_TEXT_ENCODERS) — their engines don't yet split encoder/transformer/VAE across devices. Setting the advanced variables on a Tier 1-only family is a no-op (the web UI hides the Advanced disclosure for those families so it isn't misleading).
For Qwen-Image and Qwen-Image-Edit:
- CUDA
autoprefers BF16 when enough text-encoder headroom remains, and falls back to quantized GGUF variants for local sequential, resident, and edit-conditioning paths when BF16 would be too heavy. - Metal/MPS
autoprefers the quantized Qwen2.5-VL GGUF encoder path to reduce memory pressure during prompt encoding. qwen-image-editstill loads the Qwen2.5-VL vision tower for image conditioning, but quantizedMOLD_QWEN2_VARIANTvalues keep the language side smaller and stage the vision weights only when needed.
Debug and Family-Specific Knobs
| Variable | Default | Description |
|---|---|---|
MOLD_SD3_DEBUG | — | Enable verbose SD3.5 pipeline logging |
MOLD_QWEN_DEBUG | — | Enable verbose Qwen-Image pipeline logging |
MOLD_ZIMAGE_DEBUG | — | Enable verbose Z-Image pipeline logging |
MOLD_LTX_DEBUG | — | Enable verbose LTX Video / LTX-2 pipeline logging |
MOLD_LTX_DEBUG_FILE | /tmp/mold-ltx2-debug.log | Append LTX Video / LTX-2 debug output to a file |
MOLD_LTX_DEBUG_COMPARE_UNCOND | — | Log conditional vs unconditional LTX-2 prompt-context comparisons |
MOLD_LTX_DEBUG_ALT_PROMPT | — | Use an alternate prompt string for LTX-2 prompt-sensitivity debugging |
MOLD_LTX_DEBUG_DISABLE_AUDIO_BRANCH | — | Debug-only LTX-2 switch to disable the audio branch during native runs |
MOLD_LTX_DEBUG_DISABLE_CROSS_ATTENTION_ADALN | — | Debug-only LTX-2 switch to bypass cross-attention AdaLN modulation |
MOLD_LTX2_DEBUG_DISABLE_TRANSFORMER_GATED_ATTENTION | — | Debug-only LTX-2 switch to bypass transformer gated attention |
MOLD_LTX2_DEBUG_FORCE_CPU_PROMPT_ENCODER | — | Deprecated alias for MOLD_LTX2_GEMMA_DEVICE=cpu. Emits a one-shot warn at runtime; remove in favor of the new knob. |
MOLD_LTX2_DEBUG_TIMINGS | — | Emit native LTX-2 pipeline, phase, and denoise timing summaries for optimization work |
MOLD_LTX2_DEBUG_STAGE_PREFIX | — | Write decoded native LTX-2 stage artifacts using this filename prefix |
MOLD_LTX2_DEBUG_BLOCKS | — | Emit per-block native LTX-2 transformer debug logs |
MOLD_LTX2_DEBUG_BLOCK_DETAIL | — | Restrict detailed native LTX-2 block logging to a specific transformer block index |
MOLD_LTX2_DEBUG_LOAD_BLOCKS | — | Log native LTX-2 transformer block loading details |
MOLD_LTX2_FORCE_EAGER | — | Force eager native LTX-2 transformer loading instead of layer streaming |
MOLD_LTX2_FORCE_STREAMING | — | Force native LTX-2 transformer layer streaming |
MOLD_LTX2_FP8_INPUT_SCALE_MODE | skip | Debug override for native LTX-2 FP8 input-scale handling (skip, emulate, divide, multiply) |
MOLD_LTX2_FP8_WEIGHT_SCALE_MODE | apply | Debug override for native LTX-2 FP8 checkpoint weight-scale handling (apply, skip, scaled-mm) |
MOLD_WUERSTCHEN_DEBUG | — | Enable verbose Wuerstchen pipeline logging |
MOLD_WUERSTCHEN_DECODER_GUIDANCE | 0.0 | Override decoder-stage CFG guidance for Wuerstchen |
These are intended for troubleshooting and development rather than normal use.
Build-Time Metadata
| Variable | Default | Description |
|---|---|---|
MOLD_FULL_VERSION | — | Internal build-time version string embedded into CLI output |
This variable is set during the build and is not normally configured by users at runtime.
