Building Indexes
This guide covers building your own nxv index from nixpkgs channel-release snapshots. This is an advanced topic for users who want to:
- Build indexes with different date ranges or channels
- Self-host their own nxv infrastructure
- Contribute to the official index
Most Users
If you just want to use nxv, run nxv update to download the pre-built index. A full rebuild from scratch streams ~15 GB of snapshots and takes a few hours.
How It Works
The indexer does not walk nixpkgs git history and does not need a nixpkgs checkout. It ingests channel-release snapshots from releases.nixos.org:
- 2020-06 onward: every channel release ships a
packages.json.br(~380 MB decompressed) that enumerates all ~144k attributes — including nested package sets (python3Packages.*,haskellPackages.*,nodePackages.*, ...) — with versions and metadata. No Nix evaluation is needed for this era. - 2016-09 → 2020-06: releases predate
packages.json. Opt in with--backfill-evalsto evaluate each release'snixexprs.tar.xzwithnix-env(the only path that requiresnix).
Every stored commit is a real Hydra-built channel commit: the (attribute, version) pair was verifiably present at both ends of its range.
Prerequisites
Software Requirements
- nxv with indexer feature - The indexer is feature-gated to keep the main binary small
- Nix - Only for
--backfill-evals(pre-2020 era) and--head-eval; the packages.json era needs no Nix at all - Network access to releases.nixos.org
No git, no nixpkgs clone.
Hardware Requirements
| Resource | Full rebuild |
|---|---|
| RAM | ~2 GB (default 4 parallel workers) |
| Disk | ~5 GB free (the raw database is ~2 GB) |
| Network | ~15 GB streamed (snapshots are parsed in flight, never hit disk) |
| Time | A few hours for 2020→present; +1.5-3 h with --backfill-evals |
Incremental runs (the normal case) take seconds to minutes — a typical 6-hour window sees 0-2 new releases.
Getting the Indexer
# Using Nix flakes (recommended)
nix run github:utensils/nxv#nxv-indexer -- --help
# From source
cargo build --release --features indexer
./target/release/nxv index --helpIndexing Workflow
Full Index (First Time)
nxv indexThis ingests every release of the default channels: nixpkgs-unstable (the historical spine, back to 2016) and nixos-unstable-small (the currency channel, typically hours behind master).
To also cover the pre-2020 era (requires nix):
nxv index --backfill-evalsInterrupting with Ctrl+C is safe: each release commits atomically together with its row in the releases ledger, so unfinished releases simply stay pending.
Resuming and Incremental Updates
Just run the same command again — the releases ledger tracks what has already been ingested, and only new (or still-pending) releases are processed:
nxv indexTo re-queue every known release:
nxv index --fullTo retry releases that previously failed:
nxv index --retry-failedIndexing a Specific Date Range
To ingest only releases in a date window:
# Only 2024 onwards
nxv index --since 2024-01-01
# Specific range
nxv index --since 2023-01-01 --until 2024-01-01Data-Quality Gates
Monitors check each snapshot before anything is written: attribute count floors, rolling baselines, sentinel packages (firefox, thunderbird, nh, python3Packages.requests), birth/death anomalies, and head lag. With --strict, warnings become fatal (this is what CI uses). --report report.json writes the end-of-run coverage report.
Staying Current During Channel Stalls
When the channels lag behind nixpkgs master, --head-eval evaluates master HEAD directly from a GitHub tarball (requires nix):
nxv index --strict --head-eval --report report.jsonPublishing
Generate compressed artifacts for distribution:
# Basic publish
nxv publish --output ./publish
# With signing (recommended)
nxv keygen # Creates nxv.key and nxv.pub
nxv publish --output ./publish --sign --secret-key ./nxv.keyGenerated Artifacts
| File | Size | Description |
|---|---|---|
index.db.zst | ~190 MB | Zstd-compressed SQLite database |
bloom.bin | ~330 KB | Bloom filter for fast lookups |
manifest.json | ~1 KB | Metadata with checksums |
manifest.json.minisig | ~1 KB | minisign signature (when --sign) |
Hosting
Upload artifacts to any HTTP server. Update your manifest's url_prefix:
nxv publish --output ./publish \
--url-prefix "https://example.com/nxv" \
--sign --secret-key ./nxv.keyUsers can then configure their nxv to use your index:
export NXV_MANIFEST_URL="https://example.com/nxv/manifest.json"
export NXV_PUBLIC_KEY="RWTxxxxxxxx..."
nxv updateArchitecture Deep Dive
Pipeline
Each run is: plan → parallel fetch/parse → in-order gating → aggregated upserts → finish.
- Plan: list the channel's release directories on the releases.nixos.org S3 bucket and diff against the
releasesledger; new directories becomependingrows. - Fetch/parse: workers stream each
packages.json.br, decompressing brotli and parsing JSON in flight — the ~380 MB document is never materialized in memory or on disk. - Gate: monitors validate each snapshot in release order before any write.
- Upsert: widen-only — an existing
(attribute_path, version)row only ever has itsfirst_commit_*moved earlier or itslast_commit_*moved later. Each release commits atomically with its ledger row. - Finish: rebuild the bloom filter, rebuild FTS (bulk runs drop the triggers and rebuild from scratch), and update watermarks.
Range Semantics
A row means "this (attribute, version) pair was observed at first_commit and at last_commit" — both real Hydra-built channel commits. Interior presence is interpolated, not guaranteed: a version that lives shorter than one channel advance can be missed entirely.
Interruption Safety
There is no checkpoint file. Ctrl+C triggers a graceful shutdown; releases that didn't finish stay pending in the ledger and the next run picks them up. Failed releases are parked with retry backoff instead of aborting the run.
Database Schema
The index uses SQLite with this schema (version 4):
CREATE TABLE package_versions (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL, -- Short package name, e.g. "python3"
version TEXT NOT NULL, -- e.g., "3.11.4"
first_commit_hash TEXT NOT NULL, -- Earliest channel commit observed with this version
first_commit_date INTEGER NOT NULL, -- Unix timestamp (seconds)
last_commit_hash TEXT NOT NULL, -- Latest channel commit observed with this version
last_commit_date INTEGER NOT NULL, -- Unix timestamp (seconds)
attribute_path TEXT NOT NULL, -- e.g., "python3Packages.requests"
description TEXT,
license TEXT, -- JSON array
homepage TEXT,
maintainers TEXT, -- JSON array
platforms TEXT, -- JSON array
source_path TEXT, -- e.g., "pkgs/tools/foo/default.nix"
known_vulnerabilities TEXT, -- JSON array of CVEs
UNIQUE(attribute_path, version),
CHECK(first_commit_date <= last_commit_date)
);
-- Channel-release ingestion ledger: which snapshots have been ingested,
-- which are pending/failed, and their retry/backoff state
CREATE TABLE releases (
id INTEGER PRIMARY KEY,
channel TEXT NOT NULL,
release_name TEXT NOT NULL,
commit_hash TEXT NOT NULL,
commit_count INTEGER,
release_date INTEGER NOT NULL,
source TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending/ingested/failed/skipped
attempts INTEGER NOT NULL DEFAULT 0,
last_attempt_at INTEGER,
attr_count INTEGER,
error TEXT,
ingested_at INTEGER,
UNIQUE(channel, release_name)
);
-- Full-text search index (auto-synced via triggers)
CREATE VIRTUAL TABLE package_versions_fts
USING fts5(name, description, content=package_versions, content_rowid=id);
-- Metadata
CREATE TABLE meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);Key Design: One Row Per Version Range
A given (attribute_path, version) pair is exactly one row, enforced by the UNIQUE(attribute_path, version) constraint. Re-observing a pair only widens its range. nxv dedupe exists to repair databases built by pre-0.1.5 indexer versions that left duplicate rows behind.
Insecure Packages
is_insecure is not stored as a column — it's derived at query time from known_vulnerabilities (a non-empty JSON array means the package is flagged insecure). The HTTP API's version-history endpoint surfaces this as a boolean.
Troubleshooting
A Release Fails to Ingest
Failed releases don't abort the run — they're parked in the releases ledger with retry backoff. Retry them explicitly:
nxv index --retry-failedA handful of historical releases are genuinely defective upstream (broken 2018 evaluations, corrupt 16.09/17.03 re-uploads) and will stay parked as skipped — that's expected.
Database Corruption
If the database becomes corrupted after a crash:
# Reset to fresh state
rm ~/.local/share/nxv/index.db # Linux
rm ~/Library/Application\ Support/nxv/index.db # macOS
# Re-run indexing (or `nxv update` to re-download the pre-built index)
nxv indexPre-2020 Versions Missing
The pre-2020 era is opt-in because it requires nix and takes 1.5-3 hours:
nxv index --backfill-evalsUsing a Mirror
Set NXV_RELEASES_URL to point the indexer at an alternate releases.nixos.org S3 endpoint (tests, mirrors).
CLI Reference
For complete command documentation, see the Indexer CLI Reference.