https://github.com/crawlee-cloud/crawlee-cloud
Self-hosted, open-source platform for running Apify Actors. Drop-in compatible with the Apify SDK.
https://github.com/crawlee-cloud/crawlee-cloud
apify apify-actor apify-platform apify-sdk automation crawlee crawlee-framework crawlee-project docker open-source scraper scraping self-hosted web-scraping
Last synced: about 1 month ago
JSON representation
Self-hosted, open-source platform for running Apify Actors. Drop-in compatible with the Apify SDK.
- Host: GitHub
- URL: https://github.com/crawlee-cloud/crawlee-cloud
- Owner: crawlee-cloud
- License: mit
- Created: 2025-12-25T19:01:16.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-04-28T11:58:30.000Z (about 2 months ago)
- Last Synced: 2026-04-28T13:36:35.144Z (about 2 months ago)
- Topics: apify, apify-actor, apify-platform, apify-sdk, automation, crawlee, crawlee-framework, crawlee-project, docker, open-source, scraper, scraping, self-hosted, web-scraping
- Language: TypeScript
- Homepage: https://crawlee.cloud
- Size: 1.15 MB
- Stars: 18
- Watchers: 2
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
**Self-hosted, open-source platform for running Apify Actors on your own infrastructure.**
[](LICENSE)
[](https://www.typescriptlang.org/)
[](https://nodejs.org/)
[Dashboard](#dashboard) · [Quick Start](#quick-start) · [Documentation](#documentation) · [Contributing](#contributing)
---
## Dashboard
A purpose-built operator console — monitor runs in real time, debug webhook deliveries, browse datasets, and triage failures without leaving the page.
Operator dashboard — actor count, runs, success rate, hourly throughput, recent activity feed
Webhooks — fire test events per subscription, inline delivery log with HTTP code + body, last-seen status
Run detail — live logs, container runtime, dataset / KV / queue IDs, status timeline
Settings — live version, scaler state, storage health probes (PG / Redis / S3) with latency, API keys
KV stores — click any record to pretty-print its JSON inline, no new tab needed
Runs — full execution history with status filters, durations, dataset links
Actors — your deployed scrapers with version + last-modified at a glance
---
## Why Crawlee Cloud?
Love the Crawlee/Apify ecosystem but want the freedom to run things your way? Crawlee Cloud brings the same great developer experience to your own infrastructure. Keep using the tools you love — just host them wherever you want.
### Key Benefits
- **� Your infrastructure** — Deploy on your own servers, cloud, or anywhere you like
- **🔒 Complete privacy** — Your data stays exactly where you want it
- **⚡ SDK compatible** — Works seamlessly with the Apify SDK you already know
- **🐳 Container-based** — Each Actor runs in an isolated Docker container
- **📊 Beautiful dashboard** — Monitor runs, explore datasets, manage everything visually
---
## How It Works
```bash
# Instead of pointing to Apify's servers...
export APIFY_API_BASE_URL=https://api.apify.com/v2
# Point to your own Crawlee Cloud instance
export APIFY_API_BASE_URL=https://your-server.com/v2
export APIFY_TOKEN=your-token
```
Your existing Actor code works without any modifications:
```typescript
import { Actor } from 'apify';
await Actor.init();
await Actor.pushData({ title: 'Scraped data' });
await Actor.exit();
```
---
## Quick Start
### Prerequisites
- Node.js 18+
- Docker & Docker Compose
- PostgreSQL, Redis, and S3-compatible storage (or use our Docker setup)
### 1. Clone & Install
```bash
git clone https://github.com/crawlee-cloud/crawlee-cloud.git
cd crawlee-cloud
npm install
```
### 2. Start Infrastructure
```bash
# Starts PostgreSQL, Redis, and MinIO
npm run docker:dev
```
### 3. Configure Environment
```bash
cp .env.example .env
# Edit .env with your settings
```
### 4. Build & Run
```bash
npm run build
npm run db:migrate
npm run dev
```
The API server starts at `http://localhost:3000`.
---
## Deploy
Deploy your own instance in minutes:
| Method | Status | Description |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------- | ---------------------------------------------------------------------- |
| [](https://cloud.digitalocean.com/apps/new?repo=https://github.com/crawlee-cloud/crawlee-cloud/tree/main&refcode=crawlee) | ✅ Supported | Automated full stack — App Platform, managed PG/Redis, Runner Droplet |
| [VPS Deploy Script](deploy/vps/) | ✅ Supported | Full stack on any Ubuntu VPS with auto-HTTPS via Caddy |
| Railway | 🚧 Coming soon | One-click PaaS deploy (template scaffolding present, not yet verified) |
| Render | 🚧 Coming soon | One-click PaaS deploy (blueprint present, not yet verified) |
See [deploy/](deploy/) for detailed instructions.
---
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Your Actors │
│ (using official Apify SDK, no changes) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Crawlee Cloud API │
│ (Apify-compatible REST endpoints) │
└─────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌─────────┐ ┌─────────┐
│PostgreSQL│ │ Redis │ │ S3/MinIO│
│ metadata │ │ queues │ │ blobs │
└──────────┘ └─────────┘ └─────────┘
```
### Components
| Component | Description |
| -------------- | ----------------------------------------------------------- |
| **API Server** | Fastify-based REST API compatible with Apify's v2 endpoints |
| **Runner** | Polls job queue and executes Actors in Docker containers |
| **Dashboard** | Next.js web UI for monitoring and management |
| **CLI** | Command-line tool for pushing and running Actors |
---
## Documentation
| Guide | Description |
| --------------------------------------------------------------------- | ---------------------------- |
| [API Reference](https://crawlee.cloud/docs/api) | REST API endpoints and usage |
| [CLI Guide](https://crawlee.cloud/docs/cli) | Command-line interface |
| [Dashboard](https://crawlee.cloud/docs/dashboard) | Web interface overview |
| [Deployment](https://crawlee.cloud/docs/deployment) | Production deployment guide |
| [Runner](https://crawlee.cloud/docs/runner) | Actor execution engine |
| [SDK Compatibility](https://crawlee.cloud/docs/apify-sdk-environment) | Apify SDK integration |
---
## Supported Apify SDK Features
| Feature | Status |
| -------------------------------------------- | ------------ |
| Datasets (`Actor.pushData`) | ✅ Supported |
| Key-Value Stores (`Actor.getValue/setValue`) | ✅ Supported |
| Request Queues | ✅ Supported |
| Request deduplication | ✅ Supported |
| Distributed locking | ✅ Supported |
| Builds & versioning | ✅ Supported |
| Webhooks | ✅ Supported |
| Schedules | ✅ Supported |
| Auto-scaling runners (local Docker, GHCR) | ✅ Supported |
---
## What's New
**v0.8.0** — auto-scaling with image registry support, build versioning, scale-aware run pagination, CLI profiles + `crc info`, dashboard rewrite (Builds, Schedules, Webhooks, Request Queues, Runners, KV browser). See the [full changelog](CHANGELOG.md).
> ⚠️ **Upgrading from v0.7?** `GET /metrics` and `GET /v2/scaler/status` are now admin-only. If you scrape `/metrics` with Prometheus, send an admin token in the `Authorization` header, or set `METRICS_PUBLIC=true`. See [CHANGELOG → 0.8.0 Breaking](CHANGELOG.md#080---2026-05-01).
---
## Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
```bash
# Run tests
npm test
# Type checking
npm run typecheck
# Linting
npm run lint
```
---
## License
This project is licensed under the [MIT License](LICENSE).
---
**Built with ❤️ for the web scraping community**