https://github.com/rodbland2021/agent-screenshot

Playwright screenshot tool with Vision-optimised tiling + desktop screen capture for AI agents
https://github.com/rodbland2021/agent-screenshot

ai-agents claude-code claude-vision mcp playwright screen-capture screenshot vision

Last synced: 14 days ago
JSON representation

Playwright screenshot tool with Vision-optimised tiling + desktop screen capture for AI agents

Host: GitHub
URL: https://github.com/rodbland2021/agent-screenshot
Owner: rodbland2021
License: mit
Created: 2026-03-18T02:34:26.000Z (3 months ago)
Default Branch: master
Last Pushed: 2026-03-18T05:34:33.000Z (3 months ago)
Last Synced: 2026-03-18T18:56:54.082Z (3 months ago)
Topics: ai-agents, claude-code, claude-vision, mcp, playwright, screen-capture, screenshot, vision
Language: Python
Size: 28.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

          # agent-screenshot

[![CI](https://github.com/rodbland2021/agent-screenshot/actions/workflows/ci.yml/badge.svg)](https://github.com/rodbland2021/agent-screenshot/actions/workflows/ci.yml)

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

[![Version](https://img.shields.io/badge/version-1.0.0-blue)](CHANGELOG.md)

**Two tools for giving AI agents eyes.** Automated web screenshots with Vision-optimised tiling, plus desktop screen capture.

**[Blog post: Why these tools exist and how I use them](https://rodbland.com/blog/screenshot-and-grab-tools/)**

```bash

# Screenshot a web page (tiled for AI Vision models)

python screenshot.py https://example.com --full-page

# Capture your desktop screen

python grab.py

```

## What's in the box

| Tool | What it does |

|------|-------------|

| `screenshot.py` | Playwright-based web screenshots. Full-page captures are automatically tiled into 1072x1072 chunks optimised for Claude Vision and other multimodal models. |

| `grab.py` | Desktop screen capture using Python's `mss` library. 14 region presets (halves, thirds, quadrants) for targeting specific parts of your display. |

## Install

```bash

git clone https://github.com/rodbland2021/agent-screenshot.git

cd agent-screenshot

pip install -r requirements.txt

playwright install chromium

```

Verify it works:

```bash

python screenshot.py https://example.com

# Should print a file path like /tmp/screenshots/example-com_1234567890.jpg

```

## Screenshot tool

Takes automated screenshots of any URL using headless Chromium.

```bash

# Basic screenshot

python screenshot.py https://example.com

# Mobile viewport (375x812)

python screenshot.py https://example.com --mobile

# Full page, tiled for AI Vision models

python screenshot.py https://example.com --full-page

# Dismiss cookie banners and popups

python screenshot.py https://example.com --dismiss-popups --wait-until load --wait 3000

# Custom auth header

python screenshot.py https://my-app.example.com --header "Authorization=Bearer mytoken"

# Screenshot a specific element

python screenshot.py https://example.com --selector ".hero-section"

```

### Why 1072x1072 tiles?

Claude Vision and similar multimodal models have a maximum effective resolution. A 1072x15000 pixel full-page screenshot is too large to process in detail — text blurs, UI elements become unreadable. Tiling the page into 1072x1072 chunks lets the model read every label, button, and table cell.

The tool handles this automatically with `--full-page`. A long page produces 4-8 tiles, each saved as a separate file.

### Options

| Flag | Default | Description |

|------|---------|-------------|

| `--full-page` | off | Capture entire page, tile into 1072x1072 chunks |

| `--mobile` | off | Mobile viewport (375x812) |

| `--dismiss-popups` | off | Auto-close cookie banners, geo redirects, email popups |

| `--selector` | — | CSS selector to screenshot a specific element |

| `--wait N` | 0 | Extra wait in ms after page load (for SPAs) |

| `--wait-until` | networkidle | `load`, `domcontentloaded`, or `networkidle` |

| `--width` | 1072 | Viewport width |

| `--height` | 1072 | Viewport height |

| `--quality` | 85 | JPEG quality (1-100) |

| `--png` | off | Output PNG instead of JPEG |

| `--out` | /tmp/screenshots | Output directory |

| `--max-height` | 15000 | Truncate pages taller than this |

| `--header` | — | Custom HTTP header as `KEY=VALUE` (repeatable) |

## Grab tool

Captures the physical desktop screen. Useful for sharing visual context that isn't a web page — design mockups, spreadsheets, error dialogs, anything on your monitor.

> **Requires a display.** `grab.py` needs X11 or Wayland (Linux), a desktop session (macOS/Windows), or a virtual framebuffer (`Xvfb`). It will not work on headless servers, Docker containers, or CI runners without a display server.

```bash

# Full screen

python grab.py

# Left half of screen

python grab.py left

# Top-right quadrant

python grab.py top-right

# Custom output path

python grab.py --out /tmp/my-capture.jpg

```

### Regions

`full`, `left`, `right`, `top`, `bottom`, `top-left`, `top-right`, `bottom-left`, `bottom-right`, `left-third`, `center-third`, `right-third`, `left-two-thirds`, `right-two-thirds`

### Options

| Flag | Default | Description |

|------|---------|-------------|

| `--out` | /tmp/screen.jpg | Output file path |

| `--quality` | 85 | JPEG quality (1-100) |

| `--monitor` | 1 | Monitor number for multi-display setups |

## MCP server (native agent integration)

The MCP server gives AI agents direct tool access — no shell commands needed. The agent calls `screenshot` or `grab` as native tools.

### Claude Code

Add to `~/.claude.json`:

```json

{

  "mcpServers": {

    "agent-screenshot": {

      "command": "python3",

      "args": ["/path/to/agent-screenshot/mcp_server.py"]

    }

  }

}

```

Restart Claude Code. The agent now has `screenshot` and `grab` tools available natively.

### OpenClaw / mcporter

Add to your MCP config (e.g. `~/.mcporter/mcporter.json`):

```json

{

  "mcpServers": {

    "agent-screenshot": {

      "command": "python3",

      "args": ["/path/to/agent-screenshot/mcp_server.py"]

    }

  }

}

```

### MCP tool reference

| Tool | Parameters | Description |

|------|-----------|-------------|

| `screenshot` | `url`, `full_page`, `mobile`, `dismiss_popups`, `selector`, `wait_ms`, `quality`, `headers` | Screenshot a URL, optionally tiled for Vision |

| `grab` | `region`, `output_path`, `quality`, `monitor` | Capture desktop screen |

### Additional dependency

The MCP server requires the `mcp` package in addition to the base requirements:

```bash

pip install mcp

```

## Add to your agent's instructions

For agents that don't support MCP (or as a fallback), add to your agent's instructions:

**Claude Code** (`CLAUDE.md`):

```markdown

After any visual/UI change, verify with a screenshot before reporting done:

python /path/to/screenshot.py  --full-page

Read the resulting screenshots and check for visual regressions.

```

**Cursor** (`.cursorrules`), **Aider** (`.aider.conf.yml`), **[OpenClaw](https://github.com/openclaw/openclaw)** (agent system prompt) — same instruction, adapted to your config format.

## Requirements

- **Python 3.10+** on Linux, macOS, or Windows (including WSL)

- **screenshot.py**: Playwright + Chromium. Works on any platform including headless servers, Docker, and CI.

- **grab.py**: mss + Pillow. Requires a physical or virtual display (X11, Wayland, macOS desktop, or Windows).

- **mcp_server.py** (optional): `mcp` package. For native tool integration with Claude Code and OpenClaw.

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rodbland2021/agent-screenshot

Awesome Lists containing this project

README