https://github.com/rogerchappel/crawldeck
Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.
https://github.com/rogerchappel/crawldeck
agent-tools cli crawler local-first queue typescript
Last synced: 28 days ago
JSON representation
Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.
- Host: GitHub
- URL: https://github.com/rogerchappel/crawldeck
- Owner: rogerchappel
- License: mit
- Created: 2026-05-07T08:41:53.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-17T01:16:57.000Z (about 1 month ago)
- Last Synced: 2026-05-17T02:47:29.582Z (about 1 month ago)
- Topics: agent-tools, cli, crawler, local-first, queue, typescript
- Language: TypeScript
- Size: 43 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Roadmap: ROADMAP.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# crawldeck
A local-first crawl job deck for agents and developers who want to queue, pause, inspect, and report on crawl work without surprise network calls.
crawldeck is inspired by the useful control-plane shape of [CrawlBar](https://github.com/vincentkoc/CrawlBar), but it is a fresh TypeScript CLI implementation with different branding, scope, and code. V1 deliberately starts as a fixture-backed CLI rather than a macOS menu bar app, so it is testable, deterministic, and safe for agent workflows.
## Why this exists
Crawlers tend to become invisible background magic. crawldeck makes the boring control surface explicit:
- profiles describe what can be crawled
- jobs live in local JSON queue files
- health shows queue depth and failures
- reports summarize what happened
- adapter seams leave room for real crawlers later
No telemetry. No credentials. No external crawl network calls by default.
## Install
```bash
npm install
npm run build
npm link
```
Or run directly from a checkout:
```bash
node dist/cli.js --help
```
## Quickstart
```bash
npm install
npm run build
crawldeck init
crawldeck profile add sample --fixture ./fixtures/sample-site
crawldeck job enqueue sample
crawldeck job list
crawldeck inspect sample
crawldeck job pause
crawldeck job resume
crawldeck job start
crawldeck health
crawldeck report
```
The sample fixture includes a 404 on purpose, so the started job demonstrates failure reporting.
## Commands
```text
crawldeck init
crawldeck adapters
crawldeck profile add --fixture [--out ]
crawldeck profile list
crawldeck inspect
crawldeck job enqueue
crawldeck job list
crawldeck job next
crawldeck job status
crawldeck job start
crawldeck job pause
crawldeck job resume
crawldeck job complete
crawldeck health
crawldeck report [--json]
```
## Local state
By default crawldeck writes only under:
- `.crawldeck/queue.json`
- `.crawldeck/out//...`
Use `--deck-dir ` to put the queue somewhere else.
## Adapter seam
The built-in adapter is `fixture`. Future real crawler adapters can register through the library seam:
```js
import { adapterSeam } from 'crawldeck';
adapterSeam('my-crawler', () => ({
name: 'my-crawler',
async inspect(profile) { return []; },
async run(profile, job) { return { totalItems: 0, processedItems: 0, errors: [], reportPath: '' }; }
}));
```
Real adapters should be explicit about network access, robots.txt behavior, rate limits, and credential use.
## Verification
```bash
npm test
npm run check
npm run build
npm run smoke
bash scripts/validate.sh
```
## Safety and privacy
- Local-first queue and reports.
- Fixture-backed by default.
- No hidden telemetry or analytics.
- No credential scraping or secret storage.
- No publishing or external crawling unless a future adapter explicitly implements it.
## License
MIT