https://github.com/mnemosyne-artificial-intelligence/doppelganger
Doppelganger is a self-hosted, free browser automation and scraping platform built on Playwright, designed to handle everything from simple scraping to complex, human-like browser interactions. It provides a visual task editor, a structured JSON task format, and advanced execution modes that go far beyond traditional scrapers.
https://github.com/mnemosyne-artificial-intelligence/doppelganger
agentic-tasks api automation browser-automation headless headless-browser playwright web-scraping
Last synced: 2 days ago
JSON representation
Doppelganger is a self-hosted, free browser automation and scraping platform built on Playwright, designed to handle everything from simple scraping to complex, human-like browser interactions. It provides a visual task editor, a structured JSON task format, and advanced execution modes that go far beyond traditional scrapers.
- Host: GitHub
- URL: https://github.com/mnemosyne-artificial-intelligence/doppelganger
- Owner: mnemosyne-artificial-intelligence
- License: other
- Created: 2025-12-25T01:42:51.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-01-26T18:59:48.000Z (4 days ago)
- Last Synced: 2026-01-26T19:43:40.682Z (4 days ago)
- Topics: agentic-tasks, api, automation, browser-automation, headless, headless-browser, playwright, web-scraping
- Language: TypeScript
- Homepage: https://doppelgangerdev.com
- Size: 7.09 MB
- Stars: 166
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README

# Doppelganger — Browser Automation for Everyone
[](https://doppelgangerdev.com) [](https://doppelgangerdev.com/docs) [](https://forum.doppelgangerdev.com) [](https://github.com/mnemosyne-artificial-intelligence/doppelganger/blob/main/LICENSE) [](https://www.npmjs.com/package/@doppelgangerdev/doppelganger) [](https://hub.docker.com/r/mnemosyneai/doppelganger)
Doppelganger is a self‑hosted, node‑first automation control plane built for teams that want predictable, auditable browser workflows without pushing sensitive data to third‑party SaaS. It bundles a React/Vite frontend, an Express/Playwright backend, helper scripts, and optional CLI tooling so you can sketch blocks, inject JavaScript, rotate proxies, and run everything locally.

# What You Get
- **Block‑based automation** — build flows with actions like click, type, wait, hover, and execute JavaScript against modern pages.
- **Task API + CLI** — trigger saved tasks via HTTP (`/tasks/:id/api`) or `npx doppelganger` while passing variables and securing runs with the API key you control.
- **Captures & storage** — automatically store screenshots/recordings and cookies; view them in the captures tab, reset storage, or download built assets.
- **Proxy management** — host, rotate, or import HTTP/SOCKS proxies, flag a default, and toggle rotation per task.
- **Security-first** — session authentication, IP allowlists, secret management, and audit trails live entirely inside your environment.
# Architecture Snapshot
1. **Frontend**
- Vite with React (TypeScript) drives `/dashboard`, `/tasks`, `/settings`, `/executions`, and `/captures`.
- The Settings screen is tabbed (`System`, `Data`, `Proxies`) and houses panels for API keys, user agents, layout, storage, and version info.
- Components call `/api/*` endpoints through the Vite dev proxy (see `vite.config.mts`), sharing `APP_VERSION` via `src/utils/appInfo.ts`.
2. **Backend**
- `server.js` (Express) handles auth (`/api/auth`), task metadata, hooks into Playwright, and exposes `/api/settings/*` for runtime configuration.
- Requirements: Node 18+ (LTS), Playwright bundled via `npm install`.
- Storage is plain‑file: `data/` for proxies and allowlists, `public/captures` for visuals, `storage_state.json` for cookies.
3. **Scripts & automation**
- `scripts/postinstall.js` runs when dependencies install (keep an eye if you customize).
- `agent.js`, `headful.js`, `scrape.js` expose specialized runners; the CLI binary `bin/cli.js` wires them for `npx doppelganger`.
4. **Code layout highlights**
- `src/App.tsx` glues together routing, alerts, and the sidebar that links dashboards, tasks, and settings.
- `src/components` houses reusable panels (API keys, storage, captures, proxies) that map directly to backend endpoints.
- `server.js` embeds all HTTP handlers in one file; use the `data/` helpers for proxies, API keys, and user agent preferences if you customize behavior.
# Getting Started
## Docker (Recommended)
```bash
docker pull mnemosyneai/doppelganger
docker run -d \
--name doppelganger \
-p 11345:11345 \
-e SESSION_SECRET=replace_with_long_random_value \
-v $(pwd)/data:/app/data \
-v $(pwd)/public:/app/public \
-v $(pwd)/storage_state.json:/app/storage_state.json \
mnemosyneai/doppelganger
```
Visit `http://localhost:11345`. Stop/start with `docker stop/start doppelganger`.
## Local Development (npm)
1. Install dependencies:
```bash
npm install
```
2. Launch backend + frontend:
```bash
npm run server
npm run dev
```
Frontend calls `/api` via the Vite proxy defined in `vite.config.mts`; the backend listens on `process.env.VITE_BACKEND_PORT` (default `11345`).
## Install Release via npm
If you just want to run the packaged release (no source checkout), install the published npm package and run `doppelganger` directly.
```bash
npm install -g @doppelgangerdev/doppelganger
doppelganger
```
Or use `npx`:
```bash
npx @doppelgangerdev/doppelganger
```
If you prefer not to install globally, clone the repo, run `npm install` to pull dependencies, and then run `npx @doppelgangerdev/doppelganger` inside that folder. This ensures `npx` can resolve the package from the local registry/cache while still shipping the same dashboard experience.
Set `SESSION_SECRET` and optionally mount `data/`, `public/`, and `storage_state.json` (match the Docker volume layout). The CLI spins up the same Express/Playwright stack and opens the browser-based dashboard at `http://localhost:11345` unless you override `PORT`.
## Session Secret
Set `SESSION_SECRET` before any run. A quick generator:
```bash
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
```
# Configuration
| Variable | Purpose | Default |
|----------|---------|---------|
| `SESSION_SECRET` | Signs session cookies. Required. | — |
| `ALLOWED_IPS` | Comma list for basic IP allowlisting. | none (open) |
| `TRUST_PROXY` | Honor `X-Forwarded-*` when behind a reverse proxy. | `0` |
| `VITE_DEV_PORT` | Port for front-end dev server. | `5173` |
| `VITE_BACKEND_PORT` | Backend port for proxying + scripts. | `11345` |
Proxy rotation also respects `data/proxies.json` (see below), and `data/allowed_ips.json` works as an alternate allowlist format.
## Advanced Configuration
- `PLAYWRIGHT_BROWSERS_PATH` (or set `PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH`) when using a shared Playwright installation.
- `NODE_ENV=production` enables the bundled `dist/` client and reduces console verbosity.
- `HOST=0.0.0.0` allows binding beyond localhost inside Docker containers, while `PORT` overrides the Express listen port (defaults to `11345`).
- Set `LOG_LEVEL` to `debug` if you need more Playwright or proxy diagnostics; this can also be a custom wrapper when running `node server.js`.
# UI Walkthrough
- **Dashboard** — quick stats, recent runs, and a “New Task” entry point (block or agent).
- **Task Editor** — drag blocks (click, type, wait, scroll, press, JavaScript); toggle “Rotate Proxies”; run/stop tasks; inspect results with pins & logs.
- **Captures** — review screenshots/recordings stored under `public/captures`; delete individually or refresh.
- **Executions** — historical runs with detail drill-down and the ability to re-run or download results.
- **Settings**
- **System tab**: regenerate or copy API key, select user agent, adjust layout ratio, view/copy version (`VersionPanel`), and clear storage.
- **Data tab**: manage captures and cookies.
- **Proxies tab**: add/import proxies, set defaults, toggle rotation, and inspect host vs saved entries.
# CLI & Agent Mode
- Use `npx doppelganger` (or `npm run cli`) to launch the interactive CLI that shows tasks, status, and logs.
- Behind the scenes, `bin/cli.js` can invoke `agent.js`, `headful.js`, or `scrape.js` depending on the runtime mode (`--agent`, `--headful`, `--scrape`).
- Run `node agent.js --help` to see flags like `--task`, `--browser`, or `--version`. These runners share the same settings (API key, proxies, storage) as the web UI.
- When connecting via the API key, prefer `Authorization: Bearer ` so reverse proxies can normalize headers; the CLI also accepts a `--api-key` flag for scripted runs.
# Proxies
Proxies can be defined via the UI or `data/proxies.json`:
```json
[
"http://user:pass@proxy1.example.com:8000",
{ "server": "socks5://proxy2.example.com:1080", "label": "data center" }
]
```
- `host` is always available and represents your machine’s default IP.
- Rotation settings (`round-robin` or `random`) live in the Settings screen and persist through the backend endpoints.
- Import/export operations live behind `/api/settings/proxies/import`.
# API Surface
- **Task execution** (`POST /tasks/:id/api`)
- Headers: `x-api-key` or `Authorization: Bearer `.
- Body: `{ "variables": { ... } }` to override task variables or provide runtime data.
- **Settings**:
- `/api/settings/api-key` — GET current, POST regen.
- `/api/settings/user-agent` — toggle system vs custom list.
- `/api/settings/proxies*` — GET/POST/PUT/DELETE plus rotation toggles.
- **Clear data**:
- `POST /api/clear-screenshots` — removes files in `public/captures`.
- `POST /api/clear-cookies` — deletes `storage_state.json`.
Authentication enforces sessions (`/api/auth/login`, `/api/auth/logout`, `/api/auth/me`); read `server.js` to see the guard/middleware logic.
# Task Scripting Tips
- Use JavaScript blocks to scrape structured data:
```js
return document.querySelectorAll('article').length;
```
- Keep CSS selectors narrow; the block-based editor surfaces `#`, `.`, and attribute hints.
- When running headlessly, toggle `headful.js` or `agent.js` depending on whether you need a visible browser for debugging.
- Set `task.variables` via the API to re-use generic workflows across multiple domains.
## Workflow Recipe
1. Design a task in the editor starting with a `goto` block and a `wait` block to give pages time to render.
2. Add conditional `javascript` blocks to test for specific DOM elements; use the retry/timer controls per block.
3. Attach `extract` (JSON output) or `screenshot` actions before submitting so you can inspect results in the Captures tab.
4. Toggle “Rotate Proxies” if you need egress diversity and pick a default proxy on Settings → Proxies.
5. Save the task, pin results you care about, and use the `POST /tasks/:id/api` endpoint with variables like `{"variables":{"query":"books"}}` to run it from automation tools.
# Testing & Validation
- Run `npm run build` before packaging for production; the `dist/` folder contains the compiled assets.
- Backend logging writes to the console; capture output from `server.js` for debugging proxies, authentication, or Playwright failures.
- Playwright logs are visible in the running Node process and under `node_modules/.cache` when using the CLI.
# Troubleshooting
- **“Session expired”** in the UI: confirm `SESSION_SECRET` is consistent and cookies aren’t blocked by your browser.
- **Tasks hang**: ensure the target site isn’t blocking automated browsers; try toggling `headful`/`headless` modes or adding delays.
- **Proxy import fails**: inspect `data/proxies.json` for valid URLs; the backend validates `server` as a string.
- **API key lost**: regenerate from Settings → System tab; the UI copies it automatically.
# Data Lifecycle
- Captures land in `public/captures`; regular cleanups can be scripted via `POST /api/clear-screenshots`.
- Cookies live in `storage_state.json`. Back up this file before clearing cookies via the UI or `/api/clear-cookies`.
- Proxy lists, user-agent preferences, and settings persist under `data/` (look for `proxies.json`, `allowed_ips.json`, etc.) — treat this directory as your config source control.
- Use `Storage` controls in Settings to clear data after experimentation cycles, and keep `layouts` or `version` info tracked via `localStorage` as shown in `src/components/SettingsScreen.tsx`.
# Maintenance
- The project is governed by the **[Sustainable Use License (SUL 1.0)](https://github.com/mnemosyne-artificial-intelligence/doppelganger/blob/main/LICENSE)**; hosting it as a competing service is prohibited.
- Keep `data/` and `storage_state.json` backed up if you rely on historical cookies or proxies.
- Release updates by pulling `mnemosyneai/doppelganger` (Docker) or `npm i @doppelgangerdev/doppelganger` (npm). The Settings view always displays the current package version.
- Contributions: follow `.github/` templates, respect `CONTRIBUTING.md`, and run available lint/test scripts if you touch critical areas.
# Security Considerations
- Never commit your `SESSION_SECRET`, API keys, or `storage_state.json` into shared repositories.
- Use `ALLOWED_IPS`/`data/allowed_ips.json` to gate the UI when deploying to a network-exposed host.
- Rotate API keys periodically via Settings, and log all automation runs through the Executions tab for audit purposes.
- Playwright runs inside the same Node process; keep dependencies up to date and rebuild `node_modules` after significant OS patches.
# Community & Support
- Report issues or request features via the GitHub repo issue tracker.
- Follow the authors on `https://github.com/mnemosyne-artificial-intelligence` for releases.
- Share automation recipes with other self-hosted users in your org, but respect SUL limits on sharing infrastructure.