An open API service indexing awesome lists of open source software.

https://github.com/ndrean/zexplorer

HTML processor engine on steroids. Give eyes to your LLM
https://github.com/ndrean/zexplorer

css-parser css-sanitization html-parser javascript-tools lexbor mcp-server quickjs-ng sanitize-html sqlite thorvg yoga zig zig-package

Last synced: 8 days ago
JSON representation

HTML processor engine on steroids. Give eyes to your LLM

Awesome Lists containing this project

README

          

# zexplorer (`zxp`)

![Zig support](https://img.shields.io/badge/Zig-0.15.2-color?logo=zig&color=%23f3ab20)

`zexplorer` is a fast, zero-dependency HTML+JS engine. Think `ffmpeg` for the web.
You can use it as a _command-line tool_, an _HTTP dev-server_ or an _MCP server_ for LLM agents — no browser, no Node.js, no Python, no runtime.

The MCP service gives your LLM agent eyes and persistent local storage, zero infra.


logo


**TL;DR**:

- Cold start: ~3ms
- Memory: ~12MB
- Zero dependencies. Single statically-compiled binary.
- Stateless by default, Stateful on demand with a zero-config embedded SQLite storage for local persistence.
- Pipelines: Native support for parsing Markdown, CSV, and SVG.
- Outputs: Return raw data (JSON, strings, binary arrays), Markdown or render layouts (**Flexbox**) to PNG, JPEG, WEBP, and PDF.
- MCP service - the token saver - let the LLM run scripts server-side, such as scrape, transform or render and get back the result.
- Usage: Composable CLI tool or a high-concurrency HTTP rendering service.

---

## What can it do?

It can:

- **Scrape** — fetch a URL, hydrate React, render Vue/Svelte/Lit/SolidJS, WebComponents, extract data. No headless browser.
- **Stream** — consume LLM output via SSE (currently local Ollama, extendable to any OpenAI-compatible endpoint); receive HTML chunks and rebuild a live DOM incrementally.
- **Expose** — serve as an MCP server so LLM agents (Claude Desktop, Gemini CLI…) can `run_script` using the custom API, or use the shortcuts `render_html`, `render_markdown`, `render_url`, and receive data or screenshots directly in the conversation.
- **Render** — `Flexbox` based only, all static: HTML+JS+SVG such as D3, Chart.js, Leaflet, ECharts. Basic support for Canvas API, output PNG/JPEG/WEBP/PDF.
- **Generate** — design an SVG in Figma, plug in data, batch-produce OG images or PDF reports.
- **Sanitize** — DOM+CSS-aware HTML sanitization (stylesheets, inline styles, XSS/mXSS). Built-in.
- **Run JS** — execute ES2020 scripts against a real DOM with fetch, timers, workers, and an event loop.
- **Store & Persist** - drop text, blobs, images in the local storage, no ceremony.

**Limitations**:

- no TypeScript support. JSX is supported via "tagged templates" (using `htm`).
- cannot scrape arbitrary bot protected public websites,
- cannot paint complex CSS using grid-2d nor position:fixed, no CSS functions or variables nor complex canvas nor media queries...

## Security

If you use your own trusted code, you can skip sanitization entirely. For untrusted content:

> [!WARNING]
> All layers are _best-effort_ — see [SECURITY.md](https://github.com/ndrean/zexplorer/blob/main/SECURITY.md) for full details.
>
> - **Content sanitization** — DOM+CSS-aware: stylesheets, inline styles, iframes, SVG/MathML, DOM clobbering, URI schemas, XSS/mXSS. Tested against [H5SC](https://github.com/cure53/H5SC), [OWASP](https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html), [PortSwigger](https://portswigger.net/web-security/cross-site-scripting/cheat-sheet), and [DOMPurify](https://github.com/cure53/DOMPurify).
> - **Filesystem sandbox** — kernel-enforced `openat()` with symlink blocking, traversal rejection, cross-device check.
> - **Network hardening** — timeouts, redirect/size limits, SSRF pre-flight filtering, HTTPS-only remote imports.
> - **Resource limits** — worker fan-out caps, busy-loop interrupts, max stack/GC/memory, wall-clock deadlines.

## Examples

| Example | What it shows | Output | CLI | Server |
| ------- | ------------- | ------ | :---: | :---: |
| [MCP server](#mcp-server) | Give Claude Desktop / Gemini visual eyes | PNG | – | ✓ |
| [LLM generative UI](#generative-template) | Ollama/OpenAI SSE → DOM → image | WEBP | ✓ | ✓ |
| [Dynamic HTML card](#use-dynamic-html-with-htm-and-paint) | `htm` tagged templates → paintDOM | PNG | ✓ | ✓ |
| [CSS grid / flexbox layout](#render-an-html-file-in-the-terminal) | grid-1D + flexbox → terminal image | PNG | ✓ | ✓ |
| [Scrape Hacker News](#scrape-hacker-news) | fetch → DOM query → structured data | JSON | ✓ | ✓ |
| [Vercel SPA scrape](#scrape-a-vercel-site-in-less-than-1s) | Next.js hydration → `waitForSelector` | JSON | ✓ | ✓ |
| [Vercel site snapshot](#render-the-vercel-side) | SSR page → inlined images → render | WEBP | ✓ | ✓ |
| [Echarts](#echarts) | Echarts SVG -> rasterize | WEBP | ✓ | ✓ |
| [Leaflet map PDF](#generate-a-leaflet-map-pdf-report) | GeoJSON route → OSM tiles → SVG → PDF | PDF | – | ✓ |

---

### MCP server

Start the server (the `.` sets the sandbox root for file access and the SQLite store):

```sh
./zig-out/bin/zxp serve .
```

**Connect Claude Desktop** — add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
"mcpServers": {
"zexplorer": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://localhost:9984/mcp"]
}
}
}
```

**Available tools:**

| Tool | What it does |
| ---- | ----------- |
| `render_html` | Render an HTML string → PNG/WEBP/JPEG (base64 image in MCP response) |
| `render_markdown` | Render GFM Markdown → image |
| `render_url` | Fetch a URL, run its scripts, render → image |
| `run_script` | Execute arbitrary JavaScript in the headless DOM+JS engine; returns text, JSON, or an image |
| `get_zxp_docs` | Return API docs and worked examples — call this before writing a `run_script` |
| `store_save` | Persist text or binary data (e.g. a rendered PNG) to a local SQLite store |
| `store_get` | Retrieve a stored entry by name; `data` is an ArrayBuffer |
| `store_list` | List store entries (metadata only) |
| `store_delete` | Delete a store entry by name |

The typical LLM workflow is: call `get_zxp_docs` to learn the `zxp.*` API, then call `run_script` with composed JavaScript to scrape, render, or process data. `store_*` lets the LLM persist intermediate results across stateless tool calls.

Your local storage is just:

```js
// zexplorer runs this instantly. No DB connection setup needed.
const pageTitle = document.querySelector('title').textContent;
zxp.store.save("last_scraped_title", pageTitle); // Saved instantly to SQLite
zxp.store.get("last_scraped_title");
```

**Smoke-test with curl:**

```sh
# Text result
curl -s -X POST http://localhost:9984/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"run_script","arguments":{"script":"const a=10,b=32; `The answer is ${a+b}`"}}}'
# → {"jsonrpc":"2.0","id":1,"result":{"content":[{"type":"text","text":"The answer is 42"}]}}

# Image result
curl -s -X POST http://localhost:9984/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"render_html","arguments":{"html":"

Hello MCP!

","width":400}}}'
# → {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"image","data":"iVBORw0KGgo....","mimeType":"image/png"}]}}
```

**Use `run_script` to build a D3 chart from CSV data** — the LLM composes the JS and gets an image back:

Source:

```sh
curl -s -X POST http://localhost:9984/run --data-binary @src/examples/d3_chart/output_chart.webp
```

output_chart


### Generative template

You want to use an LLM to generate some HTML with CSS for us as the engine has builtin support for SSE' text/event-stream' content-type support.

We showcase the local provider `ollama`. We used the 4.7G model "qwen2.5-coder:7b". This can be extended to any provider (OpenAI, Anthropic, Gemini) if you adapt the LLM response parsing.

- Our local LLM `ollama` is up and running: `curl -s http://localhost:11434/api/tags | head -c 200` returns _{"models":[{"name":"qwen2.5-coder:7b",....}_.
- The dev-server is up and running: `./zig-out.bin/zxp server .`

**First example**: render a generative `` component.

```html

```

Let's "live-serve" this component in a browser. The browser will send a GET request to the dev-servern which. in turn will reach the LLM. Depending upon the mood of the LLML, you can get this image:

generative template

**Second example**: interactive generative form

The HTML below is a HTML form where we select a more elaborated prompt. On submission, a JavaScript snippet will POST the prompt to the dev-server "/render_llm" endpoint.

Source:

a FORM textarea INPUT populated by four buttons with a submit button

```html

Interactive prompt (POST → base64)


Table
KPI cards
Progress steps
Invoice


A responsive table with 3 columns: Name, Status, Amount. Include 5 realistic sample rows. Use a blue header.


Width
Format

PNG
WebP
JPEG


Model
Ollama URL

Generate


Generated result

const form = document.getElementById('gen-form');
const btn = document.getElementById('gen-btn');
const status = document.getElementById('status');
const result = document.getElementById('result');

// Quick-prompt buttons fill the textarea.
document.querySelectorAll('.quick-prompts button').forEach(b => {
b.addEventListener('click', () => {
form.prompt.value = b.dataset.prompt;
});
});

form.addEventListener('submit', async e => {
e.preventDefault();

const prompt = form.prompt.value.trim();
if (!prompt) return;

btn.disabled = true;
status.className = 'status';
status.textContent = 'Generating… (this may take a few seconds)';
result.style.opacity = '0.4';

try {
const res = await fetch('http://localhost:9984/render_llm', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt,
model: form.model.value,
base_url: form.base_url.value,
width: parseInt(form.width.value, 10) || 800,
format: form.format.value,
}),
});

if (!res.ok) {
const text = await res.text();
throw new Error(`HTTP ${res.status}: ${text}`);
}

const { data, mime } = await res.json();
result.src = `data:${mime};base64,${data}`;
result.style.opacity = '1';
status.textContent = '';
} catch (err) {
status.className = 'status error';
status.textContent = `Error: ${err.message}`;
result.style.opacity = '1';
} finally {
btn.disabled = false;
}
});

```


We have selected to render a table (this is a POST request to "/render_llm"). You can get for example:

browser screenshot


#### Note about SSE format

|Provider |Content path |End signal|
|--|--|--|
|OpenAI / groq / mistral / together / ollama_v1 |.choices[0].delta.content |data: [DONE]|
|Anthropic |.delta.text (on content_block_delta events) |event: message_stop|
|Gemini |.candidates[0].content.parts[0].text |.finishReason == "STOP"|
|Ollama |.message.content |no SSE — raw NDJSON|

```txt
if openai || groq || mistral || together || ollama_v1 → .choices[0].delta.content
if anthropic → .delta.text
if gemini → .candidates[0].content.parts[0].text
```

Due to our limitations in the CSS that we are able to render, and because we are using a small model, we had to set hard and explicit constraints in our prompt.

The general system prompt that we use

```zig
pub const default_system =
"You are a UI generator. Output ONLY raw HTML — no markdown, no code fences, no backticks, no explanation. " ++
"Start your response directly with an HTML tag. " ++
"Use ONLY

and elements — NEVER , , , , , . " ++
"Use ONLY inline styles with these CSS properties: display (flex or block), flex-direction, " ++
"justify-content, align-items, flex-wrap, gap, padding, margin, color, background, " ++
"font-size, font-weight, border-radius, width, height, border, text-align, white-space. " ++
"No external fonts. No CSS variables. No animations. No tags. " ++
"CRITICAL: Every opened <div> MUST be explicitly closed with </div> before opening the next sibling <div>. " ++
"CARD PATTERN — row of sibling cards, each card has stacked label+value (NEVER nest cards): " ++
"<div style=\"display:flex;flex-direction:row;gap:16px;padding:16px\">" ++
"<div style=\"width:30%;background:#fff;border-radius:8px;padding:16px\">" ++
"<div style=\"font-size:13px;color:#666\">Label A</div>" ++
"<div style=\"font-size:24px;font-weight:bold\">Value A</div>" ++
"</div>" ++
"<div style=\"width:30%;background:#fff;border-radius:8px;padding:16px\">" ++
"<div style=\"font-size:13px;color:#666\">Label B</div>" ++
"<div style=\"font-size:24px;font-weight:bold\">Value B</div>" ++
"</div>" ++
"</div> " ++
"TABLE PATTERN — outer column container, SIBLING row divs inside it (NEVER nest rows): " ++
"<div style=\"display:flex;flex-direction:column\">" ++
"<div style=\"display:flex;flex-direction:row\">" ++
"<div style=\"width:40%\">Header A</div><div style=\"width:60%\">Header B</div>" ++
"</div>" ++
"<div style=\"display:flex;flex-direction:row\">" ++
"<div style=\"width:40%\">Cell A1</div><div style=\"width:60%\">Cell B1</div>" ++
"</div>" ++
"</div>";
```

</details>

### ECharts

An example that shows how to collect public data from a CSV source and build a [D3.js](https://github.com/d3/d3) chart.

We use the following functions: `zxp.loadHTML()` , `zxp.runScripts()`, `new XMLSerializer()` (serialize the SVG), `zxp.paintSVG()` and `zxp.encode()` (generate WEBP encoded binary) and `zxp.fs.writeFileSync()` to save it locally.

Source: <https://github.com/ndrean/zexplorer/blob/main/src/examples/echarts/echarts_svg.html>

The dev-server is up and running. We send a POST request to the endpoint where the payload is the snippet:

```sh
curl -s -X POST http://localhost:9984/run --data-binary @src/examples/echarts/run_svg.js
```

<img src="https://github.com/ndrean/zexplorer/blob/main/src/examples/echarts/echarts_svg.png" alt="D3 chart from CSV" width="400">

<br>

### Use dynamic HTML with `htm` and paint

<details><summary>We use htm to build dynamic HTML and render it as an image</summary>

[Source](https://github.com/ndrean/zexplorer/blob/main/src/examples/frameworks/htm/teset_html.html")

```html
<html>
<head>
<script>
const { html } = zxp; // embedded in the code
const name = "Zexplorer";
const version = "0.1.0";
const features = ["Lexbor DOM", "QuickJS", "Yoga Layout", "ThorVG"];

const card = html`
<div style=${{
background: "#1a1a2e",
color: "#e0e0e0",
padding: "20px",
}}>
<div style=${{
background: "#16213e",
padding: "10px",
color: "#f7a41d",
}}>
${name} v${version}
</div>
<ul style=${{ padding: "10px" }}>
${features.map(
(f) => html`
<li style=${{
background: "#0f3460",
padding: "5px",
margin: "4px",
color: "#e94560",
}}>${f}</li>
`
)}
</ul>
</div>
`;

document.body.appendChild(card);