https://github.com/chrisrobison/agentview

A text-grid web renderer for AI agents — see the web without screenshots
https://github.com/chrisrobison/agentview

Last synced: 16 days ago
JSON representation

A text-grid web renderer for AI agents — see the web without screenshots

Host: GitHub
URL: https://github.com/chrisrobison/agentview
Owner: chrisrobison
Created: 2026-02-19T09:06:22.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-02-19T11:01:18.000Z (4 months ago)
Last Synced: 2026-02-19T15:17:32.578Z (4 months ago)
Language: JavaScript
Size: 22.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # AgentView

**A text-grid web renderer for AI agents — see the web without screenshots.**

Instead of taking expensive screenshots and piping them through vision models, AgentView renders web pages as structured text grids that LLMs can reason about natively. Full JavaScript execution, spatial layout preserved, interactive elements annotated.

## Why?

| Approach | Size | Requires | Speed | Spatial Layout |

|----------|------|----------|-------|----------------|

| Screenshot + Vision | ~1MB | Vision model ($$$) | Slow | Pixel-level |

| Accessibility Tree | ~5KB | Nothing | Fast | ❌ Lost |

| Raw HTML | ~100KB+ | Nothing | Fast | ❌ Lost |

| **AgentView** | **~2-5KB** | **Nothing** | **Fast** | **✅ Preserved** |

## How It Works

```

┌─────────────────────────────────────────────┐

│  Agent API                                   │

│  navigate(url) → text grid + element map     │

│  click(ref) / type(ref, text) / scroll()     │

├─────────────────────────────────────────────┤

│  Text Grid Renderer                          │

│  Pixel positions → character grid            │

│  Interactive elements get [ref] annotations  │

├─────────────────────────────────────────────┤

│  Headless Chromium (via Playwright)          │

│  Full JS/CSS execution                       │

│  getBoundingClientRect() for all elements    │

└─────────────────────────────────────────────┘

```

The browser renders the page normally. AgentView extracts every visible element's position, size, text, and interactivity — then maps it all onto a character grid. Interactive elements get reference numbers like `[0]`, `[1]` that agents can use to click, type, or select.

## Example Output

```

═══ HACKER NEWS ══════════════════════════════════════

[0]Hacker News  [1]new  [2]past  [3]comments  [4]ask  [5]show  [6]jobs  [7]submit

 1. [8]Show HN: AgentView - text-grid browser for AI agents (github.com)

    142 points by chrisrobison 3 hours ago | [9]89 comments

 2. [10]Why LLMs don't need screenshots to browse the web

    87 points by somebody 5 hours ago | [11]34 comments

 3. [12]The future of agent-computer interfaces

    56 points by researcher 8 hours ago | [13]12 comments

[14:______________________] [15 Search]

```

~500 bytes. An LLM can read this, understand the layout, and say "click ref 8" to open the first link. No vision model needed.

## Install

```bash

npm install -g agentview

npx playwright install chromium

```

## CLI Usage

```bash

# Render a page

agentview https://news.ycombinator.com

# Interactive mode

agentview -i https://github.com

agentview> click 3

agentview> type 7 search query

agentview> scroll down

agentview> refs

agentview> quit

# JSON output (for piping to agents)

agentview -j https://example.com

# Custom grid size

agentview --cols 80 --rows 24 https://example.com

```

## HTTP API

```bash

# Start the server

agentview --serve 3000

# Navigate

curl -X POST http://localhost:3000/navigate \

  -H 'Content-Type: application/json' \

  -d '{"url": "https://example.com"}'

# Click an element

curl -X POST http://localhost:3000/click \

  -d '{"ref": 3}'

# Type into an input

curl -X POST http://localhost:3000/type \

  -d '{"ref": 7, "text": "search query"}'

# Scroll

curl -X POST http://localhost:3000/scroll \

  -d '{"direction": "down"}'

# Get current state

curl http://localhost:3000/snapshot

```

## Programmatic Usage

```javascript

const { AgentBrowser } = require('agentview');

const browser = new AgentBrowser({ cols: 120, rows: 40 });

// Navigate and get the text grid

const { view, elements, meta } = await browser.navigate('https://example.com');

console.log(view);        // The text grid

console.log(elements);    // { 0: { selector, tag, text, href }, ... }

console.log(meta);        // { url, title, cols, rows, totalRefs }

// Interact

await browser.click(3);              // Click element [3]

await browser.type(7, 'hello');      // Type into element [7]

await browser.scroll('down');        // Scroll down

const snap = await browser.snapshot(); // Re-render

await browser.close();

```

## Grid Conventions

| Element | Rendering |

|---------|-----------|

| Headings | `═══ HEADING TEXT ═══════` |

| Links | `[ref]link text` |

| Buttons | `[ref button text]` |

| Text inputs | `[ref:placeholder____]` |

| Checkboxes | `[ref:X] Label` or `[ref: ] Label` |

| Radio buttons | `[ref:●] Label` or `[ref:○] Label` |

| Dropdowns | `[ref:▼ Selected]` |

| Separators | `────────────────` |

| List items | `• Item text` |

## Design Principles

1. **Text is native to LLMs** — no vision model middleman

2. **Spatial layout matters** — a flat list of elements loses the "where" that helps agents understand pages

3. **Cheap and fast** — 2-5KB per render vs 1MB+ screenshots

4. **Full web support** — real Chromium runs the JS, we just change how the output is represented

5. **Interactive** — reference numbers map to real DOM elements for clicking, typing, etc.

## License

MIT © Christopher Robison

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chrisrobison/agentview

Awesome Lists containing this project

README