CLI Reference

The zippy command-line tool lets you create, inspect, and manage ZDS datasets without writing code. It’s perfect for shell scripts, data pipelines, and quick exploration.

Table of contents

Installation
1. From Pre-built Binaries
2. From Source
Quick Start
Commands
1. init
2. put
3. get
4. delete
5. scan
6. list
7. stats
8. validate
9. reindex
10. pack
11. unpack
Recipes
Environment Variables
Exit Codes
Next Steps

Installation

From Pre-built Binaries

Download from the releases page:

# macOS (Apple Silicon)
curl -L https://github.com/zippydata/zippy/releases/latest/download/zippy-aarch64-apple-darwin.tar.gz | tar xz
sudo mv zippy /usr/local/bin/

# macOS (Intel)
curl -L https://github.com/zippydata/zippy/releases/latest/download/zippy-x86_64-apple-darwin.tar.gz | tar xz
sudo mv zippy /usr/local/bin/

# Linux (x64)
curl -L https://github.com/zippydata/zippy/releases/latest/download/zippy-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv zippy /usr/local/bin/

From Source

# With Cargo
cargo install --path cli

# Or build locally
cargo build --release -p zippy-cli
./target/release/zippy --help

Quick Start

# Create a new dataset
zippy init ./my_dataset -c train

# Add some documents
zippy put ./my_dataset -c train doc_001 --data '{"text": "Hello world", "label": 1}'
zippy put ./my_dataset -c train doc_002 --data '{"text": "Goodbye", "label": 0}'

# View a document
zippy get ./my_dataset -c train doc_001 --pretty

# List all documents
zippy scan ./my_dataset -c train

# Check statistics
zippy stats ./my_dataset

# Package for sharing
zippy pack ./my_dataset dataset.zds

Commands

init

Create a new ZDS store with an optional initial collection.

zippy init <path> [options]

Option	Description
`-c, --collection <name>`	Initial collection name (default: `default`)
`--strict`	Enable strict schema mode

Examples:

# Basic initialization
zippy init ./my_dataset

# With named collection
zippy init ./my_dataset -c train

# Strict mode (all documents must match first document's schema)
zippy init ./my_dataset -c products --strict

put

Add or update a document in a collection.

zippy put <path> <doc_id> [options]

Option	Description
`-c, --collection <name>`	Collection name (default: `default`)
`--data <json>`	JSON document inline

If --data is not provided, reads JSON from stdin.

Examples:

# Inline JSON
zippy put ./data -c users user_001 --data '{"name": "Alice", "role": "admin"}'

# From stdin
echo '{"name": "Bob", "role": "user"}' | zippy put ./data -c users user_002

# From file
cat user.json | zippy put ./data -c users user_003

# Complex document
zippy put ./data -c orders order_001 --data '{
  "customer": "cust_123",
  "items": [{"sku": "WIDGET", "qty": 2}],
  "total": 59.98
}'

get

Retrieve a document by ID.

zippy get <path> <doc_id> [options]

Option	Description
`-c, --collection <name>`	Collection name (default: `default`)
`--pretty`	Pretty-print JSON output

Examples:

# Compact output
zippy get ./data -c users user_001
# {"name":"Alice","role":"admin"}

# Pretty-printed
zippy get ./data -c users user_001 --pretty
# {
#   "name": "Alice",
#   "role": "admin"
# }

# Pipe to jq for processing
zippy get ./data -c users user_001 | jq '.name'
# "Alice"

delete

Remove a document from a collection.

zippy delete <path> <doc_id> [options]

Option	Description
`-c, --collection <name>`	Collection name (default: `default`)

Example:

zippy delete ./data -c users user_001

scan

List and output documents from a collection.

zippy scan <path> [options]

Option	Description
`-c, --collection <name>`	Collection name (default: `default`)
`-l, --limit <n>`	Maximum documents to output
`--fields <list>`	Comma-separated fields to project
`--jsonl`	Output as JSON Lines (one per line)

Examples:

# All documents (pretty array)
zippy scan ./data -c users

# First 10 documents
zippy scan ./data -c users -l 10

# Only specific fields
zippy scan ./data -c users --fields name,email

# JSONL format (best for piping)
zippy scan ./data -c users --jsonl

# Combine with jq
zippy scan ./data -c users --jsonl | jq 'select(.role == "admin")'

list

Show all collections in a store.

zippy list <path>

Example:

zippy list ./data
# Collections:
#   users     (150 documents)
#   products  (89 documents)
#   orders    (1,247 documents)

stats

Display statistics about a store or collection.

zippy stats <path> [options]

Option	Description
`-c, --collection <name>`	Specific collection (shows all if omitted)
`--json`	Output as JSON

Examples:

# Overview of all collections
zippy stats ./data
# Store: ./data
# Total collections: 3
# Total documents: 1,486
#
# Collection    Documents    Size
# users         150          45 KB
# products      89           23 KB
# orders        1,247        892 KB

# Specific collection
zippy stats ./data -c users

# Machine-readable JSON
zippy stats ./data --json | jq '.collections[].count'

validate

Check store integrity and optionally repair indexes.

zippy validate <path> [options]

Option	Description
`-c, --collection <name>`	Collection to validate (all if omitted)
`--fix`	Rebuild indexes if invalid

Examples:

# Check all collections
zippy validate ./data
# ✓ users: valid (150 documents, index OK)
# ✓ products: valid (89 documents, index OK)

# Fix corrupted indexes
zippy validate ./data --fix

reindex

Rebuild the binary index from JSONL data.

zippy reindex <path> [options]

Option	Description
`-c, --collection <name>`	Collection name (default: `default`)

Example:

zippy reindex ./data -c users
# Reindexed 150 documents in 12ms

pack

Create a portable .zds archive from a store.

zippy pack <source> <dest>

The archive is a standard ZIP file that anyone can extract without ZDS tools.

Examples:

# Create archive
zippy pack ./my_dataset ./my_dataset.zds

# With timestamp
zippy pack ./data "backup_$(date +%Y%m%d_%H%M%S).zds"

unpack

Extract a .zds archive to a directory.

zippy unpack <source> <dest>

Examples:

# Extract with zippy
zippy unpack ./dataset.zds ./extracted

# Or use standard unzip (it's just a ZIP file!)
unzip ./dataset.zds -d ./extracted

Recipes

Import JSONL File

# Fast: use the bulk import (if available)
zippy import ./data -c train < data.jsonl

# Manual: loop through lines
i=0
while IFS= read -r line; do
    echo "$line" | zippy put ./data -c import "doc_$(printf '%06d' $i)"
    ((i++))
done < data.jsonl
echo "Imported $i documents"

Export to JSONL

# Export entire collection
zippy scan ./data -c train --jsonl > train.jsonl

# Export with filtering
zippy scan ./data -c train --jsonl | jq 'select(.score > 0.8)' > high_score.jsonl

Data Exploration with jq

# Count documents by label
zippy scan ./data -c train --jsonl | jq -s 'group_by(.label) | map({label: .[0].label, count: length})'

# Find unique values
zippy scan ./data -c train --jsonl | jq -s '[.[].category] | unique'

# Sample random documents
zippy scan ./data -c train --jsonl | shuf | head -5 | jq .

# Calculate statistics
zippy scan ./data -c train --jsonl | jq -s '{
  count: length,
  avg_score: (map(.score) | add / length),
  max_score: (map(.score) | max)
}'

Backup and Restore

# Daily backup
zippy pack ./production "backups/prod_$(date +%Y%m%d).zds"

# Restore from backup
zippy unpack backups/prod_20241201.zds ./restored

# Verify restoration
zippy stats ./restored

# Create portable archive
zippy pack ./my_dataset dataset.zds

# Share via any method (email, S3, etc.)
aws s3 cp dataset.zds s3://my-bucket/datasets/

# Recipients can use without installing ZDS
unzip dataset.zds -d extracted/
cat extracted/collections/train/meta/data.jsonl | head -5 | jq .

Pipeline Integration

# Generate data with Python, store with CLI
python generate_data.py | while IFS= read -r line; do
    id=$(echo "$line" | jq -r '._id')
    echo "$line" | zippy put ./output -c generated "$id"
done

# Process with CLI, analyze with Python
zippy scan ./data -c results --jsonl | python analyze.py

Environment Variables

Variable	Description	Default
`ZDS_CACHE_DIR`	Cache directory for remote datasets	`~/.cache/zds`
`RUST_LOG`	Log level (`debug`, `info`, `warn`, `error`)	`warn`

Example:

# Enable debug logging
RUST_LOG=debug zippy scan ./data -c train

# Custom cache directory
ZDS_CACHE_DIR=/tmp/zds-cache zippy get remote://...

Exit Codes

Code	Meaning
`0`	Success
`1`	General error (file not found, invalid data, etc.)
`2`	Invalid arguments or usage

Next Steps

Getting Started — 5-minute quickstart
Python Guide — Python SDK for programmatic access
Format Specification — On-disk structure details
Examples — Shell script examples

CLI Reference

Installation

From Pre-built Binaries

From Source

Quick Start

Commands

init

put

get

delete

scan

list

stats

validate

reindex

pack

unpack

Recipes

Import JSONL File

Export to JSONL

Data Exploration with jq

Backup and Restore

Cross-Platform Sharing

Pipeline Integration

Environment Variables

Exit Codes

Next Steps