https://github.com/disane87/scrape-dojo
🥷 Master the art of web scraping with JSON-powered workflows Define scrapes declaratively · Template everything · Run and monitor in style
https://github.com/disane87/scrape-dojo
angular astro automation browser-automation docker dojo json modular nestjs puppeteer scrape scraping self-hosted typescript web-scraping webscrape
Last synced: 27 days ago
JSON representation
🥷 Master the art of web scraping with JSON-powered workflows Define scrapes declaratively · Template everything · Run and monitor in style
- Host: GitHub
- URL: https://github.com/disane87/scrape-dojo
- Owner: Disane87
- License: mit
- Created: 2024-11-13T15:48:34.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-03-07T21:18:40.000Z (about 1 month ago)
- Last Synced: 2026-03-08T01:46:04.210Z (about 1 month ago)
- Topics: angular, astro, automation, browser-automation, docker, dojo, json, modular, nestjs, puppeteer, scrape, scraping, self-hosted, typescript, web-scraping, webscrape
- Language: TypeScript
- Homepage: http://scrape-dojo.com/
- Size: 6.65 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README

# Scrape Dojo
_Declarative web scraping & browser automation with JSON workflows_
[](https://github.com/Disane87/scrape-dojo/releases)
[](https://github.com/Disane87/scrape-dojo/pkgs/container/scrape-dojo)
[](LICENSE)
[](https://scrape-dojo.com)
[](https://nestjs.com/)
[](https://angular.dev/)
[](https://astro.build/)
[](https://pptr.dev/)
[](https://www.typescriptlang.org/)
[](https://nx.dev/)
[](https://pnpm.io/)



---
> [!NOTE]
> **🤖 AI-Aided Development (AIAD)**
>
> This project openly uses AI-assisted development (e.g. [Claude Code](https://claude.ai/code)) to accelerate workflows, improve code quality, and gain more development momentum. All AI-generated code is **reviewed and approved by humans** — this is not a vibe-coding project, but a deliberate effort to build a useful product while exploring the boundaries, benefits, and trade-offs of AI-aided development.
---
## 🥷 What is Scrape Dojo?
Scrape Dojo is a self-hosted web scraping & browser automation platform. Instead of writing Puppeteer code for every site, you define workflows declaratively in **JSON/JSONC** — like Infrastructure-as-Code, but for scraping.
**Key capabilities:**
- ⚡ **25+ built-in actions** — navigate, click, type, extract, loop, download, screenshot, and more
- 🧩 **Handlebars + JSONata** — dynamic templates and powerful data transformations
- ⏰ **Cron scheduling** — automate scrapes with cron, webhooks, or startup triggers
- 🔐 **Encrypted secrets** — AES-256-CBC at-rest encryption for credentials
- 📡 **Real-time monitoring** — SSE-powered live execution tracking in Angular UI
- 🛡️ **Auth (optional)** — JWT, OIDC/SSO, MFA/TOTP, API keys
- 🗄️ **Multi-DB** — SQLite (default), MySQL, PostgreSQL
> [!IMPORTANT]
> Scrape Dojo automates real browser interactions. Please respect website terms of service and applicable legal frameworks.
**Full documentation: [scrape-dojo.com](https://scrape-dojo.com)**
---
## 🐳 Quick Start (Docker)
```bash
# 1. Generate encryption key
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
# 2. Create docker-compose.yml
cat <<'EOF' > docker-compose.yml
services:
scrape-dojo:
image: ghcr.io/disane87/scrape-dojo:latest
ports:
- '8080:80'
environment:
- SCRAPE_DOJO_ENCRYPTION_KEY=your_generated_key_here
- SCRAPE_DOJO_AUTH_JWT_SECRET=your_random_jwt_secret_here
- SCRAPE_DOJO_AUTH_REFRESH_TOKEN_SECRET=your_random_refresh_secret_here
- DB_TYPE=sqlite
# - SCRAPE_DOJO_PROXY_URL=http://proxy:8080 # Optional: route scrapes through a proxy
volumes:
- ./data:/home/pptruser/app/data
- ./downloads:/home/pptruser/app/downloads
- ./logs:/home/pptruser/app/logs
- ./config:/home/pptruser/app/config
- ./browser-data:/home/pptruser/app/browser-data
restart: unless-stopped
EOF
# 3. Start
docker compose up -d
```
Open **http://localhost:8080** — UI and API on the same port.
> [!WARNING]
> The `SCRAPE_DOJO_ENCRYPTION_KEY` encrypts all secrets. Store it safely — if lost, existing secrets are unrecoverable.
For local development, environment variables, auth setup, and more: see the **[Quickstart Guide](https://scrape-dojo.com/de/getting-started/quickstart/)**.
---
## ⚡ Your First Scrape
Create `config/sites/my-first-scrape.jsonc`:
```jsonc
{
"$schema": "../scrapes.schema.json",
"scrapes": [
{
"id": "my-first-scrape",
"metadata": {
"description": "Read a page title",
"triggers": [{ "type": "manual" }],
},
"steps": [
{
"name": "Main",
"actions": [
{
"name": "open",
"action": "navigate",
"params": { "url": "https://example.com" },
},
{
"name": "title",
"action": "extract",
"params": { "selector": "h1" },
},
{
"name": "log",
"action": "logger",
"params": { "message": "Title: {{previousData.title}}" },
},
],
},
],
},
],
}
```
The scrape auto-appears in the UI (hot reload). Click **Run** or use the API:
```bash
curl http://localhost:8080/api/scrape/my-first-scrape
```
---
## 📖 Documentation
Everything else lives in the docs:
| Topic | Link |
| ------------------------------- | ------------------------------------------------------------------------------- |
| 🚀 Quickstart (Docker & Source) | [Getting Started](https://scrape-dojo.com/de/getting-started/quickstart/) |
| 📐 Config format & metadata | [Configuration](https://scrape-dojo.com/de/user-guide/config-format/) |
| ⚡ All 22 actions with examples | [Actions Reference](https://scrape-dojo.com/de/user-guide/actions/) |
| 🧩 Templates & JSONata | [Templates](https://scrape-dojo.com/de/user-guide/templates/) |
| ⏰ Scheduling & triggers | [Scheduling](https://scrape-dojo.com/de/user-guide/scheduling/) |
| 🔐 Secrets & variables | [Secrets & Variables](https://scrape-dojo.com/de/user-guide/secrets-variables/) |
| ⚙️ Environment variables | [Env Reference](https://scrape-dojo.com/de/developer/environment-variables/) |
| 🏗️ Architecture & API | [Developer Guide](https://scrape-dojo.com/de/developer/) |
| 🛡️ Auth (JWT/OIDC/MFA) | [Authentication](https://scrape-dojo.com/de/developer/authentication/) |
| 💡 Full examples | [Examples](https://scrape-dojo.com/de/examples/) |
---
## 🛠️ Development
```bash
git clone https://github.com/disane87/scrape-dojo.git && cd scrape-dojo
pnpm install
cp .env.example .env # Set SCRAPE_DOJO_ENCRYPTION_KEY
pnpm start # API (3000) + UI (4200)
pnpm test # All tests
```
| Command | What it does |
| --------------- | -------------------- |
| `pnpm start` | API + UI dev servers |
| `pnpm test` | All tests |
| `pnpm test:api` | API tests only |
| `pnpm test:ui` | UI tests only |
| `pnpm lint` | Lint all projects |
| `pnpm build` | Build all apps |
Commits follow [Conventional Commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `docs:`, etc.).
---
## 🤝 Contributing
- 🐛 **Issues & bugs**: [GitHub Issues](https://github.com/Disane87/scrape-dojo/issues)
- 💡 **Feature requests**: [New Issue](https://github.com/Disane87/scrape-dojo/issues/new)
- 🔀 **Pull requests**: Fork → branch → commit → PR
---
## 📄 License
[MIT](LICENSE) — use it however you like.
---
## 🌟 Contributors
---
Made with ❤️ by [Marco Franke](https://github.com/Disane87)
**[Documentation](https://scrape-dojo.com)** · **[Issues](https://github.com/Disane87/scrape-dojo/issues)** · **[Discussions](https://github.com/Disane87/scrape-dojo/discussions)**