Documentation

Everything you need to build with ZDS—from quick starts to deep dives.

Choose Your Path

🚀

Getting Started

Install ZDS and create your first dataset in under 5 minutes.

Start Here

🐍

Python Guide

Full API reference with HuggingFace, Pandas, and DuckDB integrations.

Read Guide

📦

Node.js Guide

Native bindings for backend services, ETL pipelines, and serverless.

Read Guide

⚙️

CLI Reference

Command-line tools for shell scripts and data pipelines.

View Commands

🦀

Rust Guide

Embed the core library directly in your Rust applications.

Read Guide

📐

Format Spec

Technical details of the on-disk format and index structure.

View Spec

Quick Installation

Language	Command
Python	`pip install zippy-data`
Node.js	`npm install @zippydata/core`
Rust	`cargo add zippy_data`
CLI	Download from releases

Core Concepts

Stores and Collections

A store is a directory (or ZIP archive) containing one or more collections. Each collection holds documents.

from zippy import ZDSStore, ZDataset

# Single collection (classic helper)
store = ZDSStore.open("./my_dataset", collection="train")
store.put("doc_001", {"text": "Hello world", "label": 1})
store.put("doc_002", {"text": "Goodbye", "label": 0, "extra": [1, 2, 3]})

# Multi-collection: omit the collection argument for a root-capable handle
store = ZDSStore.open("./my_dataset", native=True)
train = store.collection("train")
test = store.collection("test")

# Iterate like HuggingFace
dataset = ZDataset(train)
for doc in dataset.shuffle(seed=42):
    print(doc["text"])

print(store.list_collections())  # ['test', 'train']

# Advanced: inspect lock/mode state via the exposed root
native_root = store.root  # NativeRoot / ZDSRoot
# ⚠️ Closing the root tears down every reader/writer for this path.
# Do this only during shutdown/cleanup.
native_root.close()

💡 ZDSRoot now lives under store.root. Only reach for it when you need explicit read/write modes, manual locking, or to share the memoized root with another runtime.

⚠️ Closing the root invalidates every handle into that store. Call it once you’re done writing/reading, never mid-workload.

Documents

Documents are JSON objects with a unique _id:

{"_id": "doc_001", "text": "Hello world", "label": 1}
{"_id": "doc_002", "text": "Goodbye", "nested": {"deep": "value"}}

Schema is per-document—each document can have different fields.

Storage Modes

Mode	Files	Best For
JSONL	`meta/data.jsonl`	Performance, streaming
File-per-doc	`docs/*.json`	Git diffs, manual editing

Indexes

ZDS uses a binary index (index.bin) for O(1) lookups by document ID. The index is optional—without it, operations fall back to sequential scan.

API Overview

Operation	Python	Node.js	Rust	CLI
Open/create	`ZDSStore.open()`	`ZdsStore.open()`	`FastStore::open()`	`zippy init`
Put	`store.put(id, doc)`	`store.put(id, doc)`	`store.put(id, doc)`	`zippy put`
Get	`store.get(id)`	`store.get(id)`	`store.get(id)`	`zippy get`
Delete	`store.delete(id)`	`store.delete(id)`	`store.delete(id)`	`zippy delete`
Scan	`store.scan()`	`store.scan()`	`store.scan_all()`	`zippy scan`
Count	`len(store)`	`store.count`	`store.len()`	`zippy stats`

HuggingFace Compatibility

ZDS provides a ZDataset class that mirrors the HuggingFace Dataset API:

from zippy import ZDataset

dataset = ZDataset.from_store("./data", collection="train")

# HuggingFace-style operations
shuffled = dataset.shuffle(seed=42)
filtered = dataset.filter(lambda x: x["label"] == 1)
batches = dataset.batch(32)

# Convert to/from HuggingFace
from zippy import from_hf, to_hf
zds = from_hf(hf_dataset, "./output")
hf = to_hf(zds)

DuckDB Integration

Query ZDS collections with SQL:

from zippy import query_zds

results = query_zds(
    "./data",
    "SELECT label, COUNT(*) FROM train GROUP BY label"
)
print(results)

Need Help?

Resources:

GitHub Issues — Bug reports and feature requests
Examples — Working code samples for all languages
Paper — Design rationale and benchmarks