Server API
When running mold serve, you get a REST API for remote image generation.
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /api/generate | Generate images from prompt |
POST | /api/generate/stream | Generate with SSE progress streaming |
POST | /api/generate/chain | Chained video generation (LTX-2) |
POST | /api/generate/chain/stream | Chained video with SSE progress |
POST | /api/expand | Expand a prompt using LLM |
GET | /api/models | List available models |
POST | /api/models/load | Load/swap the active model |
POST | /api/models/pull | Pull/download a model |
DELETE | /api/models/unload | Unload model to free GPU memory |
GET | /api/gallery | List saved images |
GET | /api/gallery/image/:name | Fetch a saved image |
DELETE | /api/gallery/image/:name | Delete a saved image |
GET | /api/gallery/thumbnail/:name | Fetch a cached thumbnail |
POST | /api/upscale | Upscale image with Real-ESRGAN |
POST | /api/upscale/stream | Upscale with SSE tile progress |
POST | /api/shutdown | Trigger graceful server shutdown |
GET | /api/status | Server health + status |
GET | /health | Simple 200 OK health check |
GET | /api/openapi.json | OpenAPI spec |
GET | /api/docs | Interactive API docs (Scalar) |
GET | /metrics | Prometheus metrics (feature-gated) |
Authentication
When MOLD_API_KEY is set, all API requests (except /health, /api/docs, /api/openapi.json, and /metrics) must include an X-Api-Key header:
curl -H "X-Api-Key: your-secret-key" http://localhost:7680/api/statusWithout the header (or with an invalid key), the server returns 401 Unauthorized:
{ "error": "missing X-Api-Key header", "code": "UNAUTHORIZED" }The MOLD_API_KEY variable supports multiple formats:
- Single key:
MOLD_API_KEY=my-secret - Multiple keys:
MOLD_API_KEY=key1,key2,key3 - File reference:
MOLD_API_KEY=@/path/to/keys.txt(one key per line,#comments supported)
When MOLD_API_KEY is unset, no authentication is required (backward compatible).
The mold CLI reads MOLD_API_KEY from the environment and sends the header automatically.
Rate Limiting
When MOLD_RATE_LIMIT is set, per-IP rate limiting is enforced with two tiers:
- Generation tier (configured rate):
/api/generate,/api/generate/stream,/api/expand,/api/upscale,/api/upscale/stream,/api/models/load,/api/models/pull,/api/models/unload - Read tier (10x the configured rate):
/api/models,/api/status,/api/gallery/*
Health, docs, and /metrics endpoints are exempt from rate limiting.
Example: MOLD_RATE_LIMIT=10/min allows 10 generation requests per minute per IP, and 100 read requests per minute per IP.
Supported period formats: sec (or s), min (or m), hour (or h).
Override burst size with MOLD_RATE_LIMIT_BURST (defaults to 2x the rate, capped at 100).
When rate limited, the server returns 429 Too Many Requests with a Retry-After header:
{ "error": "rate limit exceeded", "code": "RATE_LIMITED" }Request IDs
Every response includes an X-Request-ID header for correlation. If the client sends one, it is preserved; otherwise the server generates a UUID v4.
Quick Examples
# Generate an image
curl -X POST http://localhost:7680/api/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "a glowing robot"}' \
-o robot.png
# Generate with API key authentication
curl -X POST http://localhost:7680/api/generate \
-H "Content-Type: application/json" \
-H "X-Api-Key: your-secret-key" \
-d '{"prompt": "a glowing robot"}' \
-o robot.png
# Check status
curl http://localhost:7680/api/status
# List models
curl http://localhost:7680/api/models
# Load a specific model
curl -X POST http://localhost:7680/api/models/load \
-H "Content-Type: application/json" \
-d '{"model": "flux-dev:q4"}'
# Upscale an image (base64 input, raw image output)
curl -X POST http://localhost:7680/api/upscale \
-H "Content-Type: application/json" \
-d "{\"model\":\"real-esrgan-x4plus:fp16\",\"image\":\"$(base64 < photo.png)\"}" \
-o photo_4x.png
# Interactive docs
open http://localhost:7680/api/docs/api/generate
POST /api/generate returns raw image bytes, not a JSON envelope. The response Content-Type matches the requested format, and the server includes an x-mold-seed-used header with the effective seed.
curl -i -X POST http://localhost:7680/api/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "a glowing robot in a rainy alley",
"model": "flux-schnell:q8",
"width": 1024,
"height": 1024,
"steps": 4,
"guidance": 0.0,
"output_format": "png"
}' \
-o robot.pngRepresentative headers:
HTTP/1.1 200 OK
content-type: image/png
x-mold-seed-used: 42
x-mold-dimension-warning: dimensions adjusted from 1000x1000 to 1024x1024The x-mold-dimension-warning header is present when the requested dimensions were adjusted to fit model constraints (e.g. multiples of 16, pixel cap).
Generate Request Shape
{
"prompt": "a cat on a skateboard",
"model": "flux-schnell:q8",
"width": 1024,
"height": 1024,
"steps": 4,
"seed": 42,
"guidance": 0.0,
"batch_size": 1,
"negative_prompt": "",
"source_image": "<base64>",
"strength": 0.75,
"mask_image": "<base64>",
"lora": {
"path": "/path/to/adapter.safetensors",
"scale": 1.0
},
"expand": false
}Only prompt is required. All other fields have defaults.
/api/generate/stream
The /api/generate/stream endpoint sends Server-Sent Events for progress:
event: progress
data: {"type":"queued","position":1}
event: progress
data: {"type":"stage_start","name":"Loading model weights"}
event: progress
data: {"type":"denoise_step","step":1,"total":25,"elapsed_ms":640}
event: complete
data: {"images":[{"data":[137,80,78,71],"format":"png","width":1024,"height":1024,"index":0}],"generation_time_ms":12345,"model":"flux-dev:q4","seed_used":42}Typical terminal usage:
curl -N http://localhost:7680/api/generate/stream \
-H "Content-Type: application/json" \
-d '{
"prompt": "a glowing robot",
"model": "flux-dev:q4",
"steps": 25,
"width": 1024,
"height": 1024
}'The final complete event matches the GenerateResponse JSON shape used by the server internally.
RunPod Note
RunPod's proxy has a 100-second timeout. Use the SSE streaming endpoint for long generations to keep the connection alive.
/api/generate/chain
Chained video generation for LTX-2 distilled models. Splits a long video into N per-clip renders, threads a motion-tail of latents across each clip boundary, and returns a single stitched MP4. See the LTX-2 chained video output guide for the user-facing story; this section documents the wire format.
The request body maps to mold_core::chain::ChainRequest; the response body maps to mold_core::chain::ChainResponse. The canonical schema lives in the interactive docs at /api/docs (served by the running mold server) and in the OpenAPI JSON at /api/openapi.json.
The server accepts either a pre-authored stages[] body or the auto-expand form (single prompt + total_frames + clip_frames). Auto-expand is the shape mold run sends; the canonical stages[] shape is reserved for the forthcoming movie-maker UI that will author per-stage prompts/keyframes. Both normalise to the same internal Vec<ChainStage> before any engine work kicks off.
Auto-expand body (what mold run --frames N emits):
{
"model": "ltx-2-19b-distilled:fp8",
"prompt": "a cat walking through autumn leaves",
"total_frames": 400,
"clip_frames": 97,
"source_image": "<base64 PNG>",
"motion_tail_frames": 4,
"width": 1216,
"height": 704,
"fps": 24,
"seed": 42,
"steps": 8,
"guidance": 3.0,
"strength": 1.0,
"output_format": "mp4"
}Canonical body (what the v2 movie-maker UI will author):
{
"model": "ltx-2-19b-distilled:fp8",
"stages": [
{ "prompt": "a cat walking", "frames": 97, "source_image": "<base64 PNG>" },
{ "prompt": "a cat walking", "frames": 97 },
{ "prompt": "a cat walking", "frames": 97 },
{ "prompt": "a cat walking", "frames": 97 }
],
"motion_tail_frames": 4,
"width": 1216,
"height": 704,
"fps": 24,
"seed": 42,
"steps": 8,
"guidance": 3.0,
"strength": 1.0,
"output_format": "mp4"
}Response:
{
"video": {
"data": "<base64 mp4>",
"format": "mp4",
"width": 1216,
"height": 704,
"frames": 400,
"fps": 24,
"thumbnail": "<base64 png>",
"gif_preview": "<base64 gif>",
"has_audio": false,
"duration_ms": 16666
},
"stage_count": 5,
"gpu": 0
}Error cases:
422 Unprocessable Entity— validation failure (missingprompt+total_framesin the auto-expand form, a stage with non-8k+1frames,motion_tail_frames >= clip_frames, more than 16 stages, etc.).422 Unprocessable Entity— unsupported model family. Only LTX-2 distilled engines expose a chain renderer; other families are rejected with an error that names the constraint.502 Bad Gateway— a stage errored mid-chain. The whole chain is discarded and nothing is written to the gallery; v1 is fail-closed and partial resume is a v2 feature.
Queue behaviour
The chain handler deliberately bypasses the single-job queue. A chain is a multi-minute compound operation that would stall the FIFO queue for every other request, so the handler takes the engine out of ModelCache for the full chain duration and restores it on completion (or error). Chains therefore run one-at-a-time on a given GPU; submit chains to separate GPUs via MOLD_GPUS / --gpus if you need parallelism.
/api/generate/chain/stream
Same request body as /api/generate/chain, with the response delivered as Server-Sent Events. Progress frames stream as event: progress and the terminal frame is either event: complete (success) or event: error (failure; the connection closes after the error frame).
Progress event payloads map to mold_core::chain::ChainProgressEvent variants:
event: progress
data: {"type":"chain_start","stage_count":5,"estimated_total_frames":485}
event: progress
data: {"type":"stage_start","stage_idx":0}
event: progress
data: {"type":"denoise_step","stage_idx":0,"step":1,"total":8}
event: progress
data: {"type":"stage_done","stage_idx":0,"frames_emitted":97}
event: progress
data: {"type":"stitching","total_frames":385}
event: complete
data: {"video":"<base64 mp4>","format":"mp4","width":1216,"height":704,"frames":400,"fps":24,"thumbnail":"<base64 png>","gif_preview":"<base64 gif>","has_audio":false,"duration_ms":16666,"stage_count":5,"gpu":0,"generation_time_ms":226812}The complete event payload maps to mold_core::chain::SseChainCompleteEvent. Non-denoise engine events (weight loads, cache hits, etc.) are intentionally not forwarded in v1 — the UX goal is per-stage progress, not per-component telemetry.
curl -N -X POST http://localhost:7680/api/generate/chain/stream \
-H "Content-Type: application/json" \
-d '{
"model": "ltx-2-19b-distilled:fp8",
"prompt": "a cat walking through autumn leaves",
"total_frames": 400,
"clip_frames": 97,
"motion_tail_frames": 4,
"width": 1216, "height": 704, "fps": 24,
"steps": 8, "guidance": 3.0,
"output_format": "mp4"
}'/api/status
Example response:
{
"version": "0.3.1",
"git_sha": "da039e1",
"build_date": "2026-03-25",
"models_loaded": ["flux-schnell:q8"],
"busy": false,
"gpu_info": {
"name": "NVIDIA GeForce RTX 4090",
"vram_total_mb": 24564,
"vram_used_mb": 8192
},
"uptime_secs": 3600
}/api/models/pull
Plain blocking response:
curl -X POST http://localhost:7680/api/models/pull \
-H "Content-Type: application/json" \
-d '{"model":"flux-schnell:q8"}'Example text response:
model 'flux-schnell:q8' pulled successfullySSE streaming response:
curl -N http://localhost:7680/api/models/pull \
-H "Accept: text/event-stream" \
-H "Content-Type: application/json" \
-d '{"model":"flux-schnell:q8"}'Representative events:
event: progress
data: {"type":"download_progress","filename":"flux1-schnell-Q8_0.gguf","file_index":1,"total_files":6,"bytes_downloaded":1048576,"bytes_total":12714452256}
event: progress
data: {"type":"pull_complete","model":"flux-schnell:q8"}/api/upscale
Upscale an image using Real-ESRGAN super-resolution models.
curl -X POST http://localhost:7680/api/upscale \
-H "Content-Type: application/json" \
-d '{
"model": "real-esrgan-x4plus:fp16",
"image": "<base64-encoded PNG or JPEG>",
"output_format": "png",
"tile_size": 512
}' \
--output upscaled.pngRequest fields:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Upscaler model name (e.g. real-esrgan-x4plus:fp16) |
image | string | yes | Base64-encoded input image (PNG or JPEG) |
output_format | string | no | png (default) or jpeg |
tile_size | number | no | Tile size for memory-efficient processing (0 = no tiling) |
Response: Raw image bytes (PNG or JPEG) with Content-Type header.
/api/upscale/stream
Same request format as /api/upscale, but returns SSE events for tile-by-tile progress:
curl -N -X POST http://localhost:7680/api/upscale/stream \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "real-esrgan-x4plus:fp16",
"image": "<base64-encoded PNG or JPEG>"
}'Representative events (tile progress reuses the denoise_step event type):
event: progress
data: {"type":"denoise_step","step":1,"total":9,"elapsed_ms":1200}
event: complete
data: {"image":"<base64>","model":"real-esrgan-x4plus:fp16","scale_factor":4,"width":2048,"height":2048}The server caches the upscaler engine between requests — repeated upscales with the same model skip weight loading.
Image Output
Generated images are saved to ~/.mold/output/ by default. Override with a custom path:
MOLD_OUTPUT_DIR=/srv/mold/output mold serveTo disable image persistence (TUI gallery will not function):
MOLD_OUTPUT_DIR="" mold serve