Skip to content

Server API

When running mold serve, you get a REST API for remote image generation.

Endpoints

MethodPathDescription
POST/api/generateGenerate images from prompt
POST/api/generate/streamGenerate with SSE progress streaming
POST/api/generate/chainChained video generation (LTX-2)
POST/api/generate/chain/streamChained video with SSE progress
POST/api/expandExpand a prompt using LLM
GET/api/modelsList available models
POST/api/models/loadLoad/swap the active model
POST/api/models/pullPull/download a model
DELETE/api/models/unloadUnload model to free GPU memory
GET/api/galleryList saved images
GET/api/gallery/image/:nameFetch a saved image
DELETE/api/gallery/image/:nameDelete a saved image
GET/api/gallery/thumbnail/:nameFetch a cached thumbnail
POST/api/upscaleUpscale image with Real-ESRGAN
POST/api/upscale/streamUpscale with SSE tile progress
POST/api/shutdownTrigger graceful server shutdown
GET/api/statusServer health + status
GET/healthSimple 200 OK health check
GET/api/openapi.jsonOpenAPI spec
GET/api/docsInteractive API docs (Scalar)
GET/metricsPrometheus metrics (feature-gated)

Authentication

When MOLD_API_KEY is set, all API requests (except /health, /api/docs, /api/openapi.json, and /metrics) must include an X-Api-Key header:

bash
curl -H "X-Api-Key: your-secret-key" http://localhost:7680/api/status

Without the header (or with an invalid key), the server returns 401 Unauthorized:

json
{ "error": "missing X-Api-Key header", "code": "UNAUTHORIZED" }

The MOLD_API_KEY variable supports multiple formats:

  • Single key: MOLD_API_KEY=my-secret
  • Multiple keys: MOLD_API_KEY=key1,key2,key3
  • File reference: MOLD_API_KEY=@/path/to/keys.txt (one key per line, # comments supported)

When MOLD_API_KEY is unset, no authentication is required (backward compatible).

The mold CLI reads MOLD_API_KEY from the environment and sends the header automatically.

Rate Limiting

When MOLD_RATE_LIMIT is set, per-IP rate limiting is enforced with two tiers:

  • Generation tier (configured rate): /api/generate, /api/generate/stream, /api/expand, /api/upscale, /api/upscale/stream, /api/models/load, /api/models/pull, /api/models/unload
  • Read tier (10x the configured rate): /api/models, /api/status, /api/gallery/*

Health, docs, and /metrics endpoints are exempt from rate limiting.

Example: MOLD_RATE_LIMIT=10/min allows 10 generation requests per minute per IP, and 100 read requests per minute per IP.

Supported period formats: sec (or s), min (or m), hour (or h).

Override burst size with MOLD_RATE_LIMIT_BURST (defaults to 2x the rate, capped at 100).

When rate limited, the server returns 429 Too Many Requests with a Retry-After header:

json
{ "error": "rate limit exceeded", "code": "RATE_LIMITED" }

Request IDs

Every response includes an X-Request-ID header for correlation. If the client sends one, it is preserved; otherwise the server generates a UUID v4.

Quick Examples

bash
# Generate an image
curl -X POST http://localhost:7680/api/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a glowing robot"}' \
  -o robot.png

# Generate with API key authentication
curl -X POST http://localhost:7680/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: your-secret-key" \
  -d '{"prompt": "a glowing robot"}' \
  -o robot.png

# Check status
curl http://localhost:7680/api/status

# List models
curl http://localhost:7680/api/models

# Load a specific model
curl -X POST http://localhost:7680/api/models/load \
  -H "Content-Type: application/json" \
  -d '{"model": "flux-dev:q4"}'

# Upscale an image (base64 input, raw image output)
curl -X POST http://localhost:7680/api/upscale \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"real-esrgan-x4plus:fp16\",\"image\":\"$(base64 < photo.png)\"}" \
  -o photo_4x.png

# Interactive docs
open http://localhost:7680/api/docs

/api/generate

POST /api/generate returns raw image bytes, not a JSON envelope. The response Content-Type matches the requested format, and the server includes an x-mold-seed-used header with the effective seed.

bash
curl -i -X POST http://localhost:7680/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a glowing robot in a rainy alley",
    "model": "flux-schnell:q8",
    "width": 1024,
    "height": 1024,
    "steps": 4,
    "guidance": 0.0,
    "output_format": "png"
  }' \
  -o robot.png

Representative headers:

http
HTTP/1.1 200 OK
content-type: image/png
x-mold-seed-used: 42
x-mold-dimension-warning: dimensions adjusted from 1000x1000 to 1024x1024

The x-mold-dimension-warning header is present when the requested dimensions were adjusted to fit model constraints (e.g. multiples of 16, pixel cap).

Generate Request Shape

json
{
  "prompt": "a cat on a skateboard",
  "model": "flux-schnell:q8",
  "width": 1024,
  "height": 1024,
  "steps": 4,
  "seed": 42,
  "guidance": 0.0,
  "batch_size": 1,
  "negative_prompt": "",
  "source_image": "<base64>",
  "strength": 0.75,
  "mask_image": "<base64>",
  "lora": {
    "path": "/path/to/adapter.safetensors",
    "scale": 1.0
  },
  "expand": false
}

Only prompt is required. All other fields have defaults.

/api/generate/stream

The /api/generate/stream endpoint sends Server-Sent Events for progress:

text
event: progress
data: {"type":"queued","position":1}

event: progress
data: {"type":"stage_start","name":"Loading model weights"}

event: progress
data: {"type":"denoise_step","step":1,"total":25,"elapsed_ms":640}

event: complete
data: {"images":[{"data":[137,80,78,71],"format":"png","width":1024,"height":1024,"index":0}],"generation_time_ms":12345,"model":"flux-dev:q4","seed_used":42}

Typical terminal usage:

bash
curl -N http://localhost:7680/api/generate/stream \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a glowing robot",
    "model": "flux-dev:q4",
    "steps": 25,
    "width": 1024,
    "height": 1024
  }'

The final complete event matches the GenerateResponse JSON shape used by the server internally.

RunPod Note

RunPod's proxy has a 100-second timeout. Use the SSE streaming endpoint for long generations to keep the connection alive.

/api/generate/chain

Chained video generation for LTX-2 distilled models. Splits a long video into N per-clip renders, threads a motion-tail of latents across each clip boundary, and returns a single stitched MP4. See the LTX-2 chained video output guide for the user-facing story; this section documents the wire format.

The request body maps to mold_core::chain::ChainRequest; the response body maps to mold_core::chain::ChainResponse. The canonical schema lives in the interactive docs at /api/docs (served by the running mold server) and in the OpenAPI JSON at /api/openapi.json.

The server accepts either a pre-authored stages[] body or the auto-expand form (single prompt + total_frames + clip_frames). Auto-expand is the shape mold run sends; the canonical stages[] shape is reserved for the forthcoming movie-maker UI that will author per-stage prompts/keyframes. Both normalise to the same internal Vec<ChainStage> before any engine work kicks off.

Auto-expand body (what mold run --frames N emits):

json
{
  "model": "ltx-2-19b-distilled:fp8",
  "prompt": "a cat walking through autumn leaves",
  "total_frames": 400,
  "clip_frames": 97,
  "source_image": "<base64 PNG>",
  "motion_tail_frames": 4,
  "width": 1216,
  "height": 704,
  "fps": 24,
  "seed": 42,
  "steps": 8,
  "guidance": 3.0,
  "strength": 1.0,
  "output_format": "mp4"
}

Canonical body (what the v2 movie-maker UI will author):

json
{
  "model": "ltx-2-19b-distilled:fp8",
  "stages": [
    { "prompt": "a cat walking", "frames": 97, "source_image": "<base64 PNG>" },
    { "prompt": "a cat walking", "frames": 97 },
    { "prompt": "a cat walking", "frames": 97 },
    { "prompt": "a cat walking", "frames": 97 }
  ],
  "motion_tail_frames": 4,
  "width": 1216,
  "height": 704,
  "fps": 24,
  "seed": 42,
  "steps": 8,
  "guidance": 3.0,
  "strength": 1.0,
  "output_format": "mp4"
}

Response:

json
{
  "video": {
    "data": "<base64 mp4>",
    "format": "mp4",
    "width": 1216,
    "height": 704,
    "frames": 400,
    "fps": 24,
    "thumbnail": "<base64 png>",
    "gif_preview": "<base64 gif>",
    "has_audio": false,
    "duration_ms": 16666
  },
  "stage_count": 5,
  "gpu": 0
}

Error cases:

  • 422 Unprocessable Entity — validation failure (missing prompt + total_frames in the auto-expand form, a stage with non-8k+1 frames, motion_tail_frames >= clip_frames, more than 16 stages, etc.).
  • 422 Unprocessable Entity — unsupported model family. Only LTX-2 distilled engines expose a chain renderer; other families are rejected with an error that names the constraint.
  • 502 Bad Gateway — a stage errored mid-chain. The whole chain is discarded and nothing is written to the gallery; v1 is fail-closed and partial resume is a v2 feature.

Queue behaviour

The chain handler deliberately bypasses the single-job queue. A chain is a multi-minute compound operation that would stall the FIFO queue for every other request, so the handler takes the engine out of ModelCache for the full chain duration and restores it on completion (or error). Chains therefore run one-at-a-time on a given GPU; submit chains to separate GPUs via MOLD_GPUS / --gpus if you need parallelism.

/api/generate/chain/stream

Same request body as /api/generate/chain, with the response delivered as Server-Sent Events. Progress frames stream as event: progress and the terminal frame is either event: complete (success) or event: error (failure; the connection closes after the error frame).

Progress event payloads map to mold_core::chain::ChainProgressEvent variants:

text
event: progress
data: {"type":"chain_start","stage_count":5,"estimated_total_frames":485}

event: progress
data: {"type":"stage_start","stage_idx":0}

event: progress
data: {"type":"denoise_step","stage_idx":0,"step":1,"total":8}

event: progress
data: {"type":"stage_done","stage_idx":0,"frames_emitted":97}

event: progress
data: {"type":"stitching","total_frames":385}

event: complete
data: {"video":"<base64 mp4>","format":"mp4","width":1216,"height":704,"frames":400,"fps":24,"thumbnail":"<base64 png>","gif_preview":"<base64 gif>","has_audio":false,"duration_ms":16666,"stage_count":5,"gpu":0,"generation_time_ms":226812}

The complete event payload maps to mold_core::chain::SseChainCompleteEvent. Non-denoise engine events (weight loads, cache hits, etc.) are intentionally not forwarded in v1 — the UX goal is per-stage progress, not per-component telemetry.

bash
curl -N -X POST http://localhost:7680/api/generate/chain/stream \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ltx-2-19b-distilled:fp8",
    "prompt": "a cat walking through autumn leaves",
    "total_frames": 400,
    "clip_frames": 97,
    "motion_tail_frames": 4,
    "width": 1216, "height": 704, "fps": 24,
    "steps": 8, "guidance": 3.0,
    "output_format": "mp4"
  }'

/api/status

Example response:

json
{
  "version": "0.3.1",
  "git_sha": "da039e1",
  "build_date": "2026-03-25",
  "models_loaded": ["flux-schnell:q8"],
  "busy": false,
  "gpu_info": {
    "name": "NVIDIA GeForce RTX 4090",
    "vram_total_mb": 24564,
    "vram_used_mb": 8192
  },
  "uptime_secs": 3600
}

/api/models/pull

Plain blocking response:

bash
curl -X POST http://localhost:7680/api/models/pull \
  -H "Content-Type: application/json" \
  -d '{"model":"flux-schnell:q8"}'

Example text response:

text
model 'flux-schnell:q8' pulled successfully

SSE streaming response:

bash
curl -N http://localhost:7680/api/models/pull \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"model":"flux-schnell:q8"}'

Representative events:

text
event: progress
data: {"type":"download_progress","filename":"flux1-schnell-Q8_0.gguf","file_index":1,"total_files":6,"bytes_downloaded":1048576,"bytes_total":12714452256}

event: progress
data: {"type":"pull_complete","model":"flux-schnell:q8"}

/api/upscale

Upscale an image using Real-ESRGAN super-resolution models.

bash
curl -X POST http://localhost:7680/api/upscale \
  -H "Content-Type: application/json" \
  -d '{
    "model": "real-esrgan-x4plus:fp16",
    "image": "<base64-encoded PNG or JPEG>",
    "output_format": "png",
    "tile_size": 512
  }' \
  --output upscaled.png

Request fields:

FieldTypeRequiredDescription
modelstringyesUpscaler model name (e.g. real-esrgan-x4plus:fp16)
imagestringyesBase64-encoded input image (PNG or JPEG)
output_formatstringnopng (default) or jpeg
tile_sizenumbernoTile size for memory-efficient processing (0 = no tiling)

Response: Raw image bytes (PNG or JPEG) with Content-Type header.

/api/upscale/stream

Same request format as /api/upscale, but returns SSE events for tile-by-tile progress:

bash
curl -N -X POST http://localhost:7680/api/upscale/stream \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "real-esrgan-x4plus:fp16",
    "image": "<base64-encoded PNG or JPEG>"
  }'

Representative events (tile progress reuses the denoise_step event type):

text
event: progress
data: {"type":"denoise_step","step":1,"total":9,"elapsed_ms":1200}

event: complete
data: {"image":"<base64>","model":"real-esrgan-x4plus:fp16","scale_factor":4,"width":2048,"height":2048}

The server caches the upscaler engine between requests — repeated upscales with the same model skip weight loading.

Image Output

Generated images are saved to ~/.mold/output/ by default. Override with a custom path:

bash
MOLD_OUTPUT_DIR=/srv/mold/output mold serve

To disable image persistence (TUI gallery will not function):

bash
MOLD_OUTPUT_DIR="" mold serve