https://github.com/openclaw/crawlkit

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/openclaw/crawlkit
Owner: openclaw
License: mit
Created: 2026-05-01T15:47:28.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-05-06T02:17:34.000Z (2 months ago)
Last Synced: 2026-05-06T02:34:26.694Z (2 months ago)
Language: Go
Size: 346 KB
Stars: 18
Watchers: 0
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

README

# crawlkit

Shared Go infrastructure for local-first crawler archives.

`crawlkit` is not a universal Slack, Discord, Notion, or GitHub crawler. It is
the reusable foundation beneath those tools: SQLite hygiene, TOML config
defaults, portable JSONL/Gzip packing, git-backed snapshot sharing, sync state,
CLI output helpers, control/status metadata, a shared terminal explorer, and
safe desktop-cache snapshot utilities.

## Install

```bash
go get github.com/vincentkoc/crawlkit@latest
```

Go packages are published by tagging this repository. There is no separate
package registry step. See `docs/publishing.md` for the release commands.
See `docs/boundary.md` for the crawlkit-versus-app ownership boundary.

## Packages

- `config`: standard TOML config paths, runtime dirs, and token diagnostics.
- `store`: SQLite open/read-only/transaction/query helpers.
- `snapshot`: `manifest.json` plus JSONL/Gzip table snapshot export and import.
- `mirror`: clone/init/pull/commit/push helpers for private snapshot repos.
- `state`: generic crawler cursor and freshness records.
- `output`: text/json/log output helpers.
- `control`: crawl app metadata, command manifests, status payloads, and
database inventory for launchers and automation.
- `tui`: shared terminal archive explorer with gitcrawl-style responsive panes, entity/member/detail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting/filtering, and local/remote source status.
- `cache`: safe read-only local cache snapshot helpers.

## Downstream apps

- `gitcrawl` and `discrawl` consume `crawlkit` on `main`.
- `slacrawl` and `notcrawl` consume `crawlkit` on their `feat/use-crawlkit`
integration branches until those app rewires are merged.
- The apps keep provider schemas, auth, desktop/API parsing, privacy filters,
and user-facing CLI contracts. `crawlkit` owns only the reusable mechanics.

## Safety

Library tests use temporary directories. They do not touch app runtime stores
such as `~/.config/gitcrawl`, `~/.slacrawl`, `~/.discrawl`, or `~/.notcrawl`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openclaw/crawlkit

Awesome Lists containing this project

README