https://github.com/indexzero/_all_docs
https://github.com/indexzero/_all_docs
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/indexzero/_all_docs
- Owner: indexzero
- License: apache-2.0
- Created: 2025-02-27T03:29:13.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-22T16:13:17.000Z (about 1 year ago)
- Last Synced: 2025-04-22T17:32:30.009Z (about 1 year ago)
- Language: JavaScript
- Size: 113 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# @_all_docs
> Stability: NaN – `Array(16).join("wat" - 1) + " Batman!"`
Fetch & cache :origin/_all_docs using a set of lexographically sorted keys. High-performance, partition-tolerant system for fetching and caching npm registry data at scale
**[Quick Start](#quick-start)**
·
**[Features](#features)**
·
**[Documentation](#-more-documentation)**
·
**[Architecture](#architecture)**
·
**[Contributing](#contributing)**
## Quick Start
```bash
# Install the CLI globally
npm install -g @_all_docs/cli
# Fetch npm registry partitions
npx _all_docs partition refresh --pivots ./pivots.js
# Fetch package documents
npx _all_docs packument fetch express
```
## Features
* 🛋️ Relax! Use the `start_key` and `end_key` CouchDB APIs to harness the power of partition-tolerance from the b-tree
* 🔑 Accepts a set of lexographically sorted pivots to use as B-tree partitions
* 🦿 Run map-reduce operations on `_all_docs` and `packument` entries by key range or cache partition
* 🏁 Checkpoint system tracks processing progress across partition sets
* ☁️ Parallel processing across multiple edge runtimes
* 🔜 ~🕸️⚡️🐢🦎🦀 Lightning fast partition-tolerant edge read-replica for `cache-control: immutable` "Pouch-like" `[{ _id, _rev, ...doc }*]` JSON documents out of the box!~
## Usage
### Create Partition Pivots
```javascript
// pivots.js
module.exports = [
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'
];
```
### Fetch Registry Data (from CLI)
```bash
# Refresh all partitions
npx _all_docs partition refresh --pivots ./pivots.js
# Fetch specific packages
npx _all_docs packument fetch express react vue
# Fetch packages from a file (with automatic checkpoint/resume)
npx _all_docs packument fetch-list ./packages.json
# Create cache index
npx _all_docs cache create-index > index.txt
```
### Bulk Fetch with Checkpoints
The `packument fetch-list` command fetches packuments from a JSON array or newline-delimited text file. Checkpoints are enabled by default, making large fetches resumable:
```bash
# Fetch from JSON array (checkpoint enabled by default)
npx _all_docs packument fetch-list ./packages.json
# Check progress
npx _all_docs packument fetch-list ./packages.json --status
# List any failed packages
npx _all_docs packument fetch-list ./packages.json --list-failed
# Start fresh (delete existing checkpoint)
npx _all_docs packument fetch-list ./packages.json --fresh
# Disable checkpoint for one-off fetches
npx _all_docs packument fetch-list ./packages.json --no-checkpoint
```
Input file formats:
- **JSON array**: `["lodash", "express", "@babel/core"]`
- **Text file**: One package name per line, `#` comments supported
Checkpoints track per-package progress and automatically resume on re-run. Failed packages retry up to 3 times. Progress saves every 100 packages and on Ctrl+C.
### Fetch Registry Data (from code)
```javascript
import { PartitionClient } from '@_all_docs/partition';
import { PackumentClient } from '@_all_docs/packument';
// Fetch partition data
const partitionClient = new PartitionClient({
env: { RUNTIME: 'node', CACHE_DIR: './cache' }
});
const partition = await partitionClient.request({
startKey: 'express',
endKey: 'express-z'
});
// Fetch package document
const packumentClient = new PackumentClient({
env: { RUNTIME: 'node', CACHE_DIR: './cache' }
});
const packument = await packumentClient.request('express');
```
## 📚 More Documentation
- [Getting Started Guide](./doc/getting-started.md) - Quick tutorial and common use cases
- [Architecture Overview](./doc/architecture.md) - System design and technical details
- [CLI Reference](./doc/cli-reference.md) - Complete command documentation
- [API Reference](./doc/api.md) - Programmatic usage and package APIs
## Development Setup
```bash
# Clone and install
git clone https://github.com/indexzero/_all_docs.git
cd _all_docs
pnpm install
# Run tests
pnpm test
# Start development worker
pnpm dev
```
## License
Apache-2.0 © 2024 Charlie Robbins
## Thanks
Many thanks to [bmeck], [guybedford], [mylesborins], [mikeal], [jhs], [jchris], [darcyclarke], [isaacs], & [mcollina] for all the code, docs, & past conversations that contributed to this technique working so well, 10 years later ❤️
[bmeck]: https://github.com/bmeck
[guybedford]: https://github.com/guybedford
[mylesborins]: https://github.com/mylesborins
[mikeal]: https://github.com/mikeal
[jhs]: https://github.com/jhs
[jchris]: https://github.com/jchris
[darcyclarke]: https://github.com/darcyclarke
[isaacs]: https://github.com/isaacs
[mcollina]: https://github.com/mcollina