Documentation

Everything you need to build with ZDS—from quick starts to deep dives.


Choose Your Path

🚀

Getting Started

Install ZDS and create your first dataset in under 5 minutes.

Start Here
🐍

Python Guide

Full API reference with HuggingFace, Pandas, and DuckDB integrations.

Read Guide
📦

Node.js Guide

Native bindings for backend services, ETL pipelines, and serverless.

Read Guide
⚙️

CLI Reference

Command-line tools for shell scripts and data pipelines.

View Commands
🦀

Rust Guide

Embed the core library directly in your Rust applications.

Read Guide
📐

Format Spec

Technical details of the on-disk format and index structure.

View Spec

Quick Installation

Language Command
Python pip install zippy-data
Node.js npm install @zippydata/core
Rust cargo add zippy_data
CLI Download from releases

Core Concepts

Stores and Collections

A store is a directory (or ZIP archive) containing one or more collections. Each collection holds documents.

from zippy import ZDSStore, ZDataset

# Single collection (classic helper)
store = ZDSStore.open("./my_dataset", collection="train")
store.put("doc_001", {"text": "Hello world", "label": 1})
store.put("doc_002", {"text": "Goodbye", "label": 0, "extra": [1, 2, 3]})

# Multi-collection: omit the collection argument for a root-capable handle
store = ZDSStore.open("./my_dataset", native=True)
train = store.collection("train")
test = store.collection("test")

# Iterate like HuggingFace
dataset = ZDataset(train)
for doc in dataset.shuffle(seed=42):
    print(doc["text"])

print(store.list_collections())  # ['test', 'train']

# Advanced: inspect lock/mode state via the exposed root
native_root = store.root  # NativeRoot / ZDSRoot
# ⚠️ Closing the root tears down every reader/writer for this path.
# Do this only during shutdown/cleanup.
native_root.close()

💡 ZDSRoot now lives under store.root. Only reach for it when you need explicit read/write modes, manual locking, or to share the memoized root with another runtime.

⚠️ Closing the root invalidates every handle into that store. Call it once you’re done writing/reading, never mid-workload.

Documents

Documents are JSON objects with a unique _id:

{"_id": "doc_001", "text": "Hello world", "label": 1}
{"_id": "doc_002", "text": "Goodbye", "nested": {"deep": "value"}}

Schema is per-document—each document can have different fields.

Storage Modes

Mode Files Best For
JSONL meta/data.jsonl Performance, streaming
File-per-doc docs/*.json Git diffs, manual editing

Indexes

ZDS uses a binary index (index.bin) for O(1) lookups by document ID. The index is optional—without it, operations fall back to sequential scan.

API Overview

Operation Python Node.js Rust CLI
Open/create ZDSStore.open() ZdsStore.open() FastStore::open() zippy init
Put store.put(id, doc) store.put(id, doc) store.put(id, doc) zippy put
Get store.get(id) store.get(id) store.get(id) zippy get
Delete store.delete(id) store.delete(id) store.delete(id) zippy delete
Scan store.scan() store.scan() store.scan_all() zippy scan
Count len(store) store.count store.len() zippy stats

HuggingFace Compatibility

ZDS provides a ZDataset class that mirrors the HuggingFace Dataset API:

from zippy import ZDataset

dataset = ZDataset.from_store("./data", collection="train")

# HuggingFace-style operations
shuffled = dataset.shuffle(seed=42)
filtered = dataset.filter(lambda x: x["label"] == 1)
batches = dataset.batch(32)

# Convert to/from HuggingFace
from zippy import from_hf, to_hf
zds = from_hf(hf_dataset, "./output")
hf = to_hf(zds)

DuckDB Integration

Query ZDS collections with SQL:

from zippy import query_zds

results = query_zds(
    "./data",
    "SELECT label, COUNT(*) FROM train GROUP BY label"
)
print(results)

Need Help?

Resources:
  • GitHub Issues — Bug reports and feature requests
  • Examples — Working code samples for all languages
  • Paper — Design rationale and benchmarks