{"id":46624020,"url":"https://github.com/disane87/scrape-dojo","last_synced_at":"2026-03-11T23:01:05.384Z","repository":{"id":342777201,"uuid":"887975549","full_name":"Disane87/scrape-dojo","owner":"Disane87","description":"🥷 Master the art of web scraping with JSON-powered workflows  Define scrapes declaratively · Template everything · Run and monitor in style","archived":false,"fork":false,"pushed_at":"2026-03-07T21:18:40.000Z","size":6969,"stargazers_count":3,"open_issues_count":17,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-08T01:46:04.210Z","etag":null,"topics":["angular","astro","automation","browser-automation","docker","dojo","json","modular","nestjs","puppeteer","scrape","scraping","self-hosted","typescript","web-scraping","webscrape"],"latest_commit_sha":null,"homepage":"http://scrape-dojo.com/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Disane87.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"disane87","ko_fi":"disanedev"}},"created_at":"2024-11-13T15:48:34.000Z","updated_at":"2026-03-07T22:10:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"f04b6fea-94d4-425e-985c-c64d559ec9a6","html_url":"https://github.com/Disane87/scrape-dojo","commit_stats":null,"previous_names":["disane87/scrape-dojo"],"tags_count":45,"template":false,"template_full_name":null,"purl":"pkg:github/Disane87/scrape-dojo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Disane87%2Fscrape-dojo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Disane87%2Fscrape-dojo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Disane87%2Fscrape-dojo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Disane87%2Fscrape-dojo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Disane87","download_url":"https://codeload.github.com/Disane87/scrape-dojo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Disane87%2Fscrape-dojo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30406400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-11T22:36:59.286Z","status":"ssl_error","status_checked_at":"2026-03-11T22:36:57.544Z","response_time":84,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["angular","astro","automation","browser-automation","docker","dojo","json","modular","nestjs","puppeteer","scrape","scraping","self-hosted","typescript","web-scraping","webscrape"],"created_at":"2026-03-07T22:09:06.836Z","updated_at":"2026-03-11T23:01:05.377Z","avatar_url":"https://github.com/Disane87.png","language":"TypeScript","readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"./apps/ui/public/logos/scrape-dojo-readme-logo.png\" width=\"180\" alt=\"Scrape Dojo Logo\" /\u003e\n\n# Scrape Dojo\n\n_Declarative web scraping \u0026 browser automation with JSON workflows_\n\n[![Version](https://img.shields.io/github/v/release/Disane87/scrape-dojo?style=for-the-badge\u0026label=Version\u0026color=f97316)](https://github.com/Disane87/scrape-dojo/releases)\n[![GHCR](https://img.shields.io/badge/GHCR-Container-blue?style=for-the-badge\u0026logo=github)](https://github.com/Disane87/scrape-dojo/pkgs/container/scrape-dojo)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](LICENSE)\n[![Docs](https://img.shields.io/badge/Docs-scrape--dojo.com-f97316?style=for-the-badge\u0026logo=astro\u0026logoColor=white)](https://scrape-dojo.com)\n\n[![NestJS](https://img.shields.io/badge/NestJS_11-E0234E?style=flat-square\u0026logo=nestjs\u0026logoColor=white)](https://nestjs.com/)\n[![Angular](https://img.shields.io/badge/Angular_21-DD0031?style=flat-square\u0026logo=angular\u0026logoColor=white)](https://angular.dev/)\n[![Astro](https://img.shields.io/badge/Astro_5-BC52EE?style=flat-square\u0026logo=astro\u0026logoColor=white)](https://astro.build/)\n[![Puppeteer](https://img.shields.io/badge/Puppeteer-40B5A4?style=flat-square\u0026logo=puppeteer\u0026logoColor=white)](https://pptr.dev/)\n[![TypeScript](https://img.shields.io/badge/TypeScript-3178C6?style=flat-square\u0026logo=typescript\u0026logoColor=white)](https://www.typescriptlang.org/)\n[![Nx](https://img.shields.io/badge/Nx_22-143055?style=flat-square\u0026logo=nx\u0026logoColor=white)](https://nx.dev/)\n[![pnpm](https://img.shields.io/badge/pnpm_10-F69220?style=flat-square\u0026logo=pnpm\u0026logoColor=white)](https://pnpm.io/)\n\n![GitHub Stars](https://img.shields.io/github/stars/Disane87/scrape-dojo?style=flat-square\u0026logo=github)\n![GitHub Issues](https://img.shields.io/github/issues/Disane87/scrape-dojo?style=flat-square\u0026logo=github)\n![CI](https://img.shields.io/github/actions/workflow/status/Disane87/scrape-dojo/ci.yml?style=flat-square\u0026label=CI\u0026logo=githubactions\u0026logoColor=white)\n\n\u003c/div\u003e\n\n---\n\n\u003e [!NOTE]\n\u003e **🤖 AI-Aided Development (AIAD)**\n\u003e\n\u003e This project openly uses AI-assisted development (e.g. [Claude Code](https://claude.ai/code)) to accelerate workflows, improve code quality, and gain more development momentum. All AI-generated code is **reviewed and approved by humans** — this is not a vibe-coding project, but a deliberate effort to build a useful product while exploring the boundaries, benefits, and trade-offs of AI-aided development.\n\n---\n\n## 🥷 What is Scrape Dojo?\n\nScrape Dojo is a self-hosted web scraping \u0026 browser automation platform. Instead of writing Puppeteer code for every site, you define workflows declaratively in **JSON/JSONC** — like Infrastructure-as-Code, but for scraping.\n\n**Key capabilities:**\n\n- ⚡ **25+ built-in actions** — navigate, click, type, extract, loop, download, screenshot, and more\n- 🧩 **Handlebars + JSONata** — dynamic templates and powerful data transformations\n- ⏰ **Cron scheduling** — automate scrapes with cron, webhooks, or startup triggers\n- 🔐 **Encrypted secrets** — AES-256-CBC at-rest encryption for credentials\n- 📡 **Real-time monitoring** — SSE-powered live execution tracking in Angular UI\n- 🛡️ **Auth (optional)** — JWT, OIDC/SSO, MFA/TOTP, API keys\n- 🗄️ **Multi-DB** — SQLite (default), MySQL, PostgreSQL\n\n\u003e [!IMPORTANT]\n\u003e Scrape Dojo automates real browser interactions. Please respect website terms of service and applicable legal frameworks.\n\n**Full documentation: [scrape-dojo.com](https://scrape-dojo.com)**\n\n---\n\n## 🐳 Quick Start (Docker)\n\n```bash\n# 1. Generate encryption key\nnode -e \"console.log(require('crypto').randomBytes(32).toString('hex'))\"\n\n# 2. Create docker-compose.yml\ncat \u003c\u003c'EOF' \u003e docker-compose.yml\nservices:\n  scrape-dojo:\n    image: ghcr.io/disane87/scrape-dojo:latest\n    ports:\n      - '8080:80'\n    environment:\n      - SCRAPE_DOJO_ENCRYPTION_KEY=your_generated_key_here\n      - SCRAPE_DOJO_AUTH_JWT_SECRET=your_random_jwt_secret_here\n      - SCRAPE_DOJO_AUTH_REFRESH_TOKEN_SECRET=your_random_refresh_secret_here\n      - DB_TYPE=sqlite\n      # - SCRAPE_DOJO_PROXY_URL=http://proxy:8080  # Optional: route scrapes through a proxy\n    volumes:\n      - ./data:/home/pptruser/app/data\n      - ./downloads:/home/pptruser/app/downloads\n      - ./logs:/home/pptruser/app/logs\n      - ./config:/home/pptruser/app/config\n      - ./browser-data:/home/pptruser/app/browser-data\n    restart: unless-stopped\nEOF\n\n# 3. Start\ndocker compose up -d\n```\n\nOpen **http://localhost:8080** — UI and API on the same port.\n\n\u003e [!WARNING]\n\u003e The `SCRAPE_DOJO_ENCRYPTION_KEY` encrypts all secrets. Store it safely — if lost, existing secrets are unrecoverable.\n\nFor local development, environment variables, auth setup, and more: see the **[Quickstart Guide](https://scrape-dojo.com/de/getting-started/quickstart/)**.\n\n---\n\n## ⚡ Your First Scrape\n\nCreate `config/sites/my-first-scrape.jsonc`:\n\n```jsonc\n{\n  \"$schema\": \"../scrapes.schema.json\",\n  \"scrapes\": [\n    {\n      \"id\": \"my-first-scrape\",\n      \"metadata\": {\n        \"description\": \"Read a page title\",\n        \"triggers\": [{ \"type\": \"manual\" }],\n      },\n      \"steps\": [\n        {\n          \"name\": \"Main\",\n          \"actions\": [\n            {\n              \"name\": \"open\",\n              \"action\": \"navigate\",\n              \"params\": { \"url\": \"https://example.com\" },\n            },\n            {\n              \"name\": \"title\",\n              \"action\": \"extract\",\n              \"params\": { \"selector\": \"h1\" },\n            },\n            {\n              \"name\": \"log\",\n              \"action\": \"logger\",\n              \"params\": { \"message\": \"Title: {{previousData.title}}\" },\n            },\n          ],\n        },\n      ],\n    },\n  ],\n}\n```\n\nThe scrape auto-appears in the UI (hot reload). Click **Run** or use the API:\n\n```bash\ncurl http://localhost:8080/api/scrape/my-first-scrape\n```\n\n---\n\n## 📖 Documentation\n\nEverything else lives in the docs:\n\n| Topic                           | Link                                                                            |\n| ------------------------------- | ------------------------------------------------------------------------------- |\n| 🚀 Quickstart (Docker \u0026 Source) | [Getting Started](https://scrape-dojo.com/de/getting-started/quickstart/)       |\n| 📐 Config format \u0026 metadata     | [Configuration](https://scrape-dojo.com/de/user-guide/config-format/)           |\n| ⚡ All 22 actions with examples | [Actions Reference](https://scrape-dojo.com/de/user-guide/actions/)             |\n| 🧩 Templates \u0026 JSONata          | [Templates](https://scrape-dojo.com/de/user-guide/templates/)                   |\n| ⏰ Scheduling \u0026 triggers        | [Scheduling](https://scrape-dojo.com/de/user-guide/scheduling/)                 |\n| 🔐 Secrets \u0026 variables          | [Secrets \u0026 Variables](https://scrape-dojo.com/de/user-guide/secrets-variables/) |\n| ⚙️ Environment variables        | [Env Reference](https://scrape-dojo.com/de/developer/environment-variables/)    |\n| 🏗️ Architecture \u0026 API           | [Developer Guide](https://scrape-dojo.com/de/developer/)                        |\n| 🛡️ Auth (JWT/OIDC/MFA)          | [Authentication](https://scrape-dojo.com/de/developer/authentication/)          |\n| 💡 Full examples                | [Examples](https://scrape-dojo.com/de/examples/)                                |\n\n---\n\n## 🛠️ Development\n\n```bash\ngit clone https://github.com/disane87/scrape-dojo.git \u0026\u0026 cd scrape-dojo\npnpm install\ncp .env.example .env  # Set SCRAPE_DOJO_ENCRYPTION_KEY\npnpm start            # API (3000) + UI (4200)\npnpm test             # All tests\n```\n\n| Command         | What it does         |\n| --------------- | -------------------- |\n| `pnpm start`    | API + UI dev servers |\n| `pnpm test`     | All tests            |\n| `pnpm test:api` | API tests only       |\n| `pnpm test:ui`  | UI tests only        |\n| `pnpm lint`     | Lint all projects    |\n| `pnpm build`    | Build all apps       |\n\nCommits follow [Conventional Commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `docs:`, etc.).\n\n---\n\n## 🤝 Contributing\n\n- 🐛 **Issues \u0026 bugs**: [GitHub Issues](https://github.com/Disane87/scrape-dojo/issues)\n- 💡 **Feature requests**: [New Issue](https://github.com/Disane87/scrape-dojo/issues/new)\n- 🔀 **Pull requests**: Fork → branch → commit → PR\n\n---\n\n## 📄 License\n\n[MIT](LICENSE) — use it however you like.\n\n---\n\n## 🌟 Contributors\n\n\u003c!-- readme: contributors -start --\u003e\n\u003ca href=\"https://github.com/Disane87/scrape-dojo/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=Disane87/scrape-dojo\" alt=\"Contributors\" /\u003e\n\u003c/a\u003e\n\u003c!-- readme: contributors -end --\u003e\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\nMade with ❤️ by [Marco Franke](https://github.com/Disane87)\n\n**[Documentation](https://scrape-dojo.com)** · **[Issues](https://github.com/Disane87/scrape-dojo/issues)** · **[Discussions](https://github.com/Disane87/scrape-dojo/discussions)**\n\n\u003c/div\u003e\n","funding_links":["https://github.com/sponsors/disane87","https://ko-fi.com/disanedev"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdisane87%2Fscrape-dojo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdisane87%2Fscrape-dojo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdisane87%2Fscrape-dojo/lists"}