Building Indexes
This guide covers building your own nxv index from a local nixpkgs checkout. This is an advanced topic for users who want to:
- Create indexes for custom nixpkgs forks
- Build indexes with different date ranges
- Self-host their own nxv infrastructure
- Contribute to the official index
Most Users If you just want to use nxv, run nxv update to download the
pre-built index. Building your own index takes 24+ hours and significant resources. :::
Prerequisites
Software Requirements
- nxv with indexer feature - The indexer is feature-gated to keep the main binary small
- Nix - With flakes enabled (for evaluation)
- Git - For cloning and traversing nixpkgs history
Hardware Requirements
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 32 GB |
| Disk | 50 GB free | 100 GB free |
| CPU | 4 cores | 8+ cores |
| Time | 24 hours | 12 hours (parallel) |
Getting the Indexer
# Using Nix flakes (recommended)
nix run github:utensils/nxv#nxv-indexer -- --help
# From source
cargo build --release --features indexer
./target/release/nxv index --helpCloning nixpkgs
# Full clone (~3 GB)
git clone https://github.com/NixOS/nixpkgs.git
# Or shallow clone for faster download (limits --since range)
git clone --depth 10000 https://github.com/NixOS/nixpkgs.gitIndexing Workflow
Full Index (First Time)
A full index processes all nixpkgs commits since 2017-01-01:
nxv index --nixpkgs-path ./nixpkgsThis takes 24-48 hours depending on hardware. Progress is checkpointed every 100 commits, so you can safely interrupt with Ctrl+C and resume later.
Resuming Interrupted Indexing
Just run the same command again:
# Picks up from the last checkpoint automatically
nxv index --nixpkgs-path ./nixpkgsTo force a fresh start (ignoring checkpoints):
nxv index --nixpkgs-path ./nixpkgs --fullIndexing a Specific Date Range
To index only recent commits:
# Only 2024 onwards
nxv index --nixpkgs-path ./nixpkgs --since 2024-01-01
# Specific range
nxv index --nixpkgs-path ./nixpkgs --since 2023-01-01 --until 2024-01-01Backfilling Metadata
After indexing, some metadata may be missing (source paths, homepages, vulnerability info). Use backfill to update:
HEAD Mode (Default)
Extracts from the current nixpkgs checkout. Fast but may miss renamed/removed packages:
nxv backfill --nixpkgs-path ./nixpkgsHistorical Mode
Traverses git history to find each package's original commit. Slower but complete:
nxv backfill --nixpkgs-path ./nixpkgs --historySelective Backfill
Update only specific fields:
# Only source paths
nxv backfill --nixpkgs-path ./nixpkgs --fields source-path
# Multiple fields
nxv backfill --nixpkgs-path ./nixpkgs --fields source-path,homepagePublishing
Generate compressed artifacts for distribution:
# Basic publish
nxv publish --output ./publish
# With signing (recommended)
nxv keygen # Creates nxv.key and nxv.pub
nxv publish --output ./publish --sign --secret-key ./nxv.keyGenerated Artifacts
| File | Size | Description |
|---|---|---|
index.db.zst | ~28 MB | Zstd-compressed SQLite database |
bloom.bin | ~96 KB | Bloom filter for fast lookups |
manifest.json | ~1 KB | Metadata with checksums and signature |
Hosting
Upload artifacts to any HTTP server. Update your manifest's url_prefix:
nxv publish --output ./publish \
--url-prefix "https://example.com/nxv" \
--sign --secret-key ./nxv.keyUsers can then configure their nxv to use your index:
export NXV_MANIFEST_URL="https://example.com/nxv/manifest.json"
export NXV_PUBLIC_KEY="RWTxxxxxxxx..."
nxv updateArchitecture Deep Dive
Version Extraction Fallback Chain
Not all packages expose versions the same way. The indexer tries multiple sources:
| Priority | Source | Example |
|---|---|---|
| 1 | pkg.version | Most packages |
| 2 | pkg.unwrapped.version | Wrapper packages (neovim) |
| 3 | pkg.passthru.unwrapped.version | Passthru metadata |
| 4 | Parse from pkg.name | "hello-2.12" → "2.12" |
The version_source field in the database tracks which method was used, enabling debugging without re-indexing.
all-packages.nix Optimization
The file pkgs/top-level/all-packages.nix changes frequently but usually affects only a few packages. Instead of extracting all ~18,000 packages on every commit:
- Parse the git diff for changed lines
- Extract affected attribute names (assignment patterns, inherit statements)
- Evaluate only those specific packages
- Average: ~7 packages per commit vs 18,000
This optimization provides 100x+ speedup for incremental indexing.
Checkpointing and Ctrl+C Safety
Progress is saved every 100 commits (configurable via --checkpoint-interval):
- Checkpoint data: Last indexed commit hash, date, statistics
- Atomic writes: Database commits are transactional
- Signal handling: Ctrl+C triggers graceful shutdown with checkpoint save
- Resume: Next run reads checkpoint and continues from last position
Database Schema
The index uses SQLite with this schema (version 4):
CREATE TABLE package_versions (
attribute_path TEXT, -- e.g., "python311"
version TEXT, -- e.g., "3.11.4"
version_source TEXT, -- direct/unwrapped/passthru/name
first_commit_hash TEXT, -- Earliest commit with this version
first_commit_date TEXT, -- RFC3339 timestamp
last_commit_hash TEXT, -- Latest commit with this version
last_commit_date TEXT, -- RFC3339 timestamp
description TEXT,
license TEXT, -- JSON array
homepage TEXT,
maintainers TEXT, -- JSON array
platforms TEXT, -- JSON array
source_path TEXT, -- e.g., "pkgs/tools/foo/default.nix"
known_vulnerabilities TEXT, -- JSON array of CVEs
store_path TEXT, -- Only for commits >= 2020-01-01
is_insecure BOOLEAN,
UNIQUE(attribute_path, version)
);
-- Full-text search index (auto-synced via triggers)
CREATE VIRTUAL TABLE package_versions_fts USING fts5(description);
-- Metadata
CREATE TABLE meta (
key TEXT PRIMARY KEY,
value TEXT
);Key Design: One Row Per Version
Each (attribute_path, version) pair has exactly one row. When the same version appears in multiple commits:
first_commit_*tracks the earliest appearancelast_commit_*tracks the latest appearance
This provides version timeline information without row explosion.
Store Path Extraction
Store paths are only extracted for commits after 2020-01-01, when cache.nixos.org availability became reliable. Earlier commits may have packages that aren't in the binary cache.
Troubleshooting
Stuck on a Commit
Some commits may have evaluation issues. Use --max-commits to limit processing and isolate the problematic commit, or use --since/--until to skip a date range:
# Limit to next 100 commits to isolate the issue
nxv index --nixpkgs-path ./nixpkgs --max-commits 100
# Skip problematic date range
nxv index --nixpkgs-path ./nixpkgs --since 2023-06-01Database Corruption
If the database becomes corrupted after a crash:
# Reset to fresh state
rm ~/.local/share/nxv/index.db # Linux
rm ~/Library/Application\ Support/nxv/index.db # macOS
# Re-run indexing
nxv index --nixpkgs-path ./nixpkgsnixpkgs Repository Issues
If the nixpkgs clone is in an inconsistent state:
# Reset to known good state
nxv reset --nixpkgs-path ./nixpkgs --fetchCLI Reference
For complete command documentation, see the Indexer CLI Reference.