Benchmarks
ZDS is designed for high-performance document storage. Here’s how it compares to alternatives.
Summary
| Operation | ZDS | SQLite | Pandas CSV | HF Datasets |
|---|---|---|---|---|
| Write | 4.66M rec/s | 237k | 205k | 633k |
| Read All (warm) | 510k rec/s | 263k | 8.18M* | 40k |
| Random Access (warm) | 308k rec/s | 88k | 227k | 30k |
*Pandas warm = in-memory DataFrame (different semantics)
Test conditions: Apple M3 Max, 100,000 records (~200 bytes each), macOS 15
Key Findings
Write Performance
ZDS achieves 20x faster writes than SQLite:
SQLite ████████░░░░░░░░░░░░░░░░░░░ 237k rec/s
LevelDB ██████████░░░░░░░░░░░░░░░░░ 422k rec/s
HF Dataset ███████████████░░░░░░░░░░░░ 633k rec/s
ZDS Native ████████████████████████████████████████ 4.66M rec/s ★
Why? ZDS uses append-only JSONL writes with buffered I/O. No transaction overhead, no WAL, no index updates during writes. The binary index is built lazily.
Random Access Performance
ZDS achieves 3.5x faster random lookups than SQLite (warm):
HF Dataset ███░░░░░░░░░░░░░░░░░░░░░░░░ 30k rec/s
SQLite ██████░░░░░░░░░░░░░░░░░░░░░ 88k rec/s
Pandas ████████████████░░░░░░░░░░░ 227k rec/s
ZDS Native ███████████████████████████ 308k rec/s ★
Why? O(1) FxHashMap lookup + mmap seek. No query parsing, no B-tree traversal. Direct offset-based access.
Read All Performance
For sequential reads, Pandas CSV wins on cold starts due to optimized C parser. ZDS excels in warm scenarios:
| Approach | Cold | Warm |
|---|---|---|
| Pandas CSV | 957k | 8.18M* |
| ZDS Native | 292k | 510k |
| SQLite | 267k | 263k |
| HF Datasets | 40k | 40k |
*In-memory DataFrame (not comparable)
Python Benchmarks
Full Results (100k records)
┌─────────────────────────────────────────────────────────────────────────┐
│ Python Benchmark (100k records) │
├──────────────┬─────────┬──────────────┬──────────────┬──────────────────┤
│ Approach │ Write │ Read (cold) │ Read (warm) │ Random (warm) │
├──────────────┼─────────┼──────────────┼──────────────┼──────────────────┤
│ ZDS Native │ 4.66M ★ │ 292k │ 510k │ 308k ★ │
│ SQLite │ 237k │ 267k │ 263k │ 88k │
│ Pandas CSV │ 205k │ 957k ★ │ 8.18M † │ 227k │
│ HF Datasets │ 633k │ 40k │ 40k │ 30k │
└──────────────┴─────────┴──────────────┴──────────────┴──────────────────┘
† in-memory DataFrame
Running Python Benchmarks
cd benchmarks/python
pip install pandas datasets orjson
# Default: 100k records
python benchmark_io.py
# Custom size
python benchmark_io.py -n 500000 -r 5000
# Save results to JSON
python benchmark_io.py -o results.json
Node.js Benchmarks
Full Results (100k records)
┌─────────────────────────────────────────────────────────────────────────┐
│ Node.js Benchmark (100k records) │
├──────────────┬─────────┬──────────────┬──────────────┬──────────────────┤
│ Approach │ Write │ Read (cold) │ Read (warm) │ Random (warm) │
├──────────────┼─────────┼──────────────┼──────────────┼──────────────────┤
│ ZDS Native │ 4.26M ★ │ 385k │ 828k ★ │ 201k │
│ SQLite │ 344k │ 735k ★ │ 650k │ 263k ★ │
│ LevelDB │ 422k │ 291k │ 443k │ 69k │
└──────────────┴─────────┴──────────────┴──────────────┴──────────────────┘
Running Node.js Benchmarks
cd benchmarks/nodejs
npm install
# Default: 100k records
node benchmark_io.js
# Custom size
node benchmark_io.js -n=500000
Rust Core Benchmarks
For maximum performance, the Rust core library provides comparative benchmarks against SQLite and Sled.
cd crates/zippy_data
cargo bench --bench comparison
Write Performance (Apple M3 Max)
| Records | ZDS | SQLite | Sled |
|---|---|---|---|
| 1,000 | 8.5 ms | 20 ms | 60 ms |
| 10,000 | 59 ms | 163 ms | 114 ms |
Read Performance (Warm)
| Records | ZDS | SQLite | Sled |
|---|---|---|---|
| 10,000 | 9.8 ms | 1.9 ms | 1.9 ms |
| 100,000 | 95 ms | 20 ms | 22 ms |
Random Access (1000 lookups on 10k docs)
| Store | Time | Throughput |
|---|---|---|
| ZDS | 2.0 ms | 505 K/s |
| SQLite | 2.2 ms | 453 K/s |
| Sled | 0.27 ms | 3.6 M/s |
Sample Results
ingestion_buffered/buffered_write/10000
time: [2.8 ms 2.9 ms 3.0 ms]
thrpt: [3.3M 3.4M 3.5M elem/s]
random_access/get_by_id/10000
time: [1.2 µs 1.3 µs 1.4 µs]
scan/full_scan/10000 time: [8.2 ms 8.4 ms 8.6 ms]
thrpt: [1.16M 1.19M 1.22M elem/s]
Methodology
Cold vs Warm
- Cold: Fresh process, includes file open and index loading
- Warm: Store already open, measures operation only
Test Data
Each record is ~200 bytes with mixed types:
{
"id": "record_00000001",
"name": "User 1",
"email": "user1@example.com",
"age": 42,
"score": 87.5,
"active": true,
"tags": ["a", "b"],
"metadata": {"created": "2025-01-15", "source": "web"}
}
What We Measure
- End-to-end latency: Time from API call to data available
- Real-world patterns: Cold start, warm cache, mixed workloads
- Apples-to-apples: Same data, same operations, same machine
Fairness Notes
- Pandas warm read is an in-memory DataFrame—not a fair storage comparison
- SQLite uses WAL mode and
synchronous=NORMAL - ZDS benefits from OS page cache for mmap reads
- HuggingFace Datasets uses Arrow, optimized for sequential iteration
See BENCHMARK.md for complete methodology.
When to Use What
| Scenario | Recommendation |
|---|---|
| Bulk data ingestion | ZDS |
| Random key-value lookups | ZDS |
| Schema-flexible documents | ZDS |
| Complex SQL queries | SQLite |
| Columnar analytics | Parquet/Arrow |
| Pure sequential iteration | HuggingFace Datasets |
| Maximum compression | Parquet |
Reproduce Results
All benchmarks are in the repository:
# Clone
git clone https://github.com/zippydata/zippy
cd zippy
# Python
cd benchmarks/python && python benchmark_io.py
# Node.js
cd benchmarks/nodejs && npm install && node benchmark_io.js
# Rust
cd crates/zippy_data && cargo bench