https://github.com/ndrean/zexplorer
HTML processor engine on steroids. Give eyes to your LLM
https://github.com/ndrean/zexplorer
css-parser css-sanitization html-parser javascript-tools lexbor mcp-server quickjs-ng sanitize-html sqlite thorvg yoga zig zig-package
Last synced: 8 days ago
JSON representation
HTML processor engine on steroids. Give eyes to your LLM
- Host: GitHub
- URL: https://github.com/ndrean/zexplorer
- Owner: ndrean
- License: mit
- Created: 2025-07-27T23:29:03.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2026-03-06T18:25:33.000Z (3 months ago)
- Last Synced: 2026-03-06T21:47:46.544Z (3 months ago)
- Topics: css-parser, css-sanitization, html-parser, javascript-tools, lexbor, mcp-server, quickjs-ng, sanitize-html, sqlite, thorvg, yoga, zig, zig-package
- Language: Zig
- Homepage: https://ndrean.github.io/zexplorer
- Size: 107 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# zexplorer (`zxp`)

`zexplorer` is a fast, zero-dependency HTML+JS engine. Think `ffmpeg` for the web.
You can use it as a _command-line tool_, an _HTTP dev-server_ or an _MCP server_ for LLM agents — no browser, no Node.js, no Python, no runtime.
The MCP service gives your LLM agent eyes and persistent local storage, zero infra.
**TL;DR**:
- Cold start: ~3ms
- Memory: ~12MB
- Zero dependencies. Single statically-compiled binary.
- Stateless by default, Stateful on demand with a zero-config embedded SQLite storage for local persistence.
- Pipelines: Native support for parsing Markdown, CSV, and SVG.
- Outputs: Return raw data (JSON, strings, binary arrays), Markdown or render layouts (**Flexbox**) to PNG, JPEG, WEBP, and PDF.
- MCP service - the token saver - let the LLM run scripts server-side, such as scrape, transform or render and get back the result.
- Usage: Composable CLI tool or a high-concurrency HTTP rendering service.
---
## What can it do?
It can:
- **Scrape** — fetch a URL, hydrate React, render Vue/Svelte/Lit/SolidJS, WebComponents, extract data. No headless browser.
- **Stream** — consume LLM output via SSE (currently local Ollama, extendable to any OpenAI-compatible endpoint); receive HTML chunks and rebuild a live DOM incrementally.
- **Expose** — serve as an MCP server so LLM agents (Claude Desktop, Gemini CLI…) can `run_script` using the custom API, or use the shortcuts `render_html`, `render_markdown`, `render_url`, and receive data or screenshots directly in the conversation.
- **Render** — `Flexbox` based only, all static: HTML+JS+SVG such as D3, Chart.js, Leaflet, ECharts. Basic support for Canvas API, output PNG/JPEG/WEBP/PDF.
- **Generate** — design an SVG in Figma, plug in data, batch-produce OG images or PDF reports.
- **Sanitize** — DOM+CSS-aware HTML sanitization (stylesheets, inline styles, XSS/mXSS). Built-in.
- **Run JS** — execute ES2020 scripts against a real DOM with fetch, timers, workers, and an event loop.
- **Store & Persist** - drop text, blobs, images in the local storage, no ceremony.
**Limitations**:
- no TypeScript support. JSX is supported via "tagged templates" (using `htm`).
- cannot scrape arbitrary bot protected public websites,
- cannot paint complex CSS using grid-2d nor position:fixed, no CSS functions or variables nor complex canvas nor media queries...
## Security
If you use your own trusted code, you can skip sanitization entirely. For untrusted content:
> [!WARNING]
> All layers are _best-effort_ — see [SECURITY.md](https://github.com/ndrean/zexplorer/blob/main/SECURITY.md) for full details.
>
> - **Content sanitization** — DOM+CSS-aware: stylesheets, inline styles, iframes, SVG/MathML, DOM clobbering, URI schemas, XSS/mXSS. Tested against [H5SC](https://github.com/cure53/H5SC), [OWASP](https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html), [PortSwigger](https://portswigger.net/web-security/cross-site-scripting/cheat-sheet), and [DOMPurify](https://github.com/cure53/DOMPurify).
> - **Filesystem sandbox** — kernel-enforced `openat()` with symlink blocking, traversal rejection, cross-device check.
> - **Network hardening** — timeouts, redirect/size limits, SSRF pre-flight filtering, HTTPS-only remote imports.
> - **Resource limits** — worker fan-out caps, busy-loop interrupts, max stack/GC/memory, wall-clock deadlines.
## Examples
| Example | What it shows | Output | CLI | Server |
| ------- | ------------- | ------ | :---: | :---: |
| [MCP server](#mcp-server) | Give Claude Desktop / Gemini visual eyes | PNG | – | ✓ |
| [LLM generative UI](#generative-template) | Ollama/OpenAI SSE → DOM → image | WEBP | ✓ | ✓ |
| [Dynamic HTML card](#use-dynamic-html-with-htm-and-paint) | `htm` tagged templates → paintDOM | PNG | ✓ | ✓ |
| [CSS grid / flexbox layout](#render-an-html-file-in-the-terminal) | grid-1D + flexbox → terminal image | PNG | ✓ | ✓ |
| [Scrape Hacker News](#scrape-hacker-news) | fetch → DOM query → structured data | JSON | ✓ | ✓ |
| [Vercel SPA scrape](#scrape-a-vercel-site-in-less-than-1s) | Next.js hydration → `waitForSelector` | JSON | ✓ | ✓ |
| [Vercel site snapshot](#render-the-vercel-side) | SSR page → inlined images → render | WEBP | ✓ | ✓ |
| [Echarts](#echarts) | Echarts SVG -> rasterize | WEBP | ✓ | ✓ |
| [Leaflet map PDF](#generate-a-leaflet-map-pdf-report) | GeoJSON route → OSM tiles → SVG → PDF | PDF | – | ✓ |
---
### MCP server
Start the server (the `.` sets the sandbox root for file access and the SQLite store):
```sh
./zig-out/bin/zxp serve .
```
**Connect Claude Desktop** — add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"zexplorer": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://localhost:9984/mcp"]
}
}
}
```
**Available tools:**
| Tool | What it does |
| ---- | ----------- |
| `render_html` | Render an HTML string → PNG/WEBP/JPEG (base64 image in MCP response) |
| `render_markdown` | Render GFM Markdown → image |
| `render_url` | Fetch a URL, run its scripts, render → image |
| `run_script` | Execute arbitrary JavaScript in the headless DOM+JS engine; returns text, JSON, or an image |
| `get_zxp_docs` | Return API docs and worked examples — call this before writing a `run_script` |
| `store_save` | Persist text or binary data (e.g. a rendered PNG) to a local SQLite store |
| `store_get` | Retrieve a stored entry by name; `data` is an ArrayBuffer |
| `store_list` | List store entries (metadata only) |
| `store_delete` | Delete a store entry by name |
The typical LLM workflow is: call `get_zxp_docs` to learn the `zxp.*` API, then call `run_script` with composed JavaScript to scrape, render, or process data. `store_*` lets the LLM persist intermediate results across stateless tool calls.
Your local storage is just:
```js
// zexplorer runs this instantly. No DB connection setup needed.
const pageTitle = document.querySelector('title').textContent;
zxp.store.save("last_scraped_title", pageTitle); // Saved instantly to SQLite
zxp.store.get("last_scraped_title");
```
**Smoke-test with curl:**
```sh
# Text result
curl -s -X POST http://localhost:9984/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"run_script","arguments":{"script":"const a=10,b=32; `The answer is ${a+b}`"}}}'
# → {"jsonrpc":"2.0","id":1,"result":{"content":[{"type":"text","text":"The answer is 42"}]}}
# Image result
curl -s -X POST http://localhost:9984/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"render_html","arguments":{"html":"
Hello MCP!
","width":400}}}'
# → {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"image","data":"iVBORw0KGgo....","mimeType":"image/png"}]}}
```
**Use `run_script` to build a D3 chart from CSV data** — the LLM composes the JS and gets an image back:
Source:
```sh
curl -s -X POST http://localhost:9984/run --data-binary @src/examples/d3_chart/output_chart.webp
```

### Generative template
You want to use an LLM to generate some HTML with CSS for us as the engine has builtin support for SSE' text/event-stream' content-type support.
We showcase the local provider `ollama`. We used the 4.7G model "qwen2.5-coder:7b". This can be extended to any provider (OpenAI, Anthropic, Gemini) if you adapt the LLM response parsing.
- Our local LLM `ollama` is up and running: `curl -s http://localhost:11434/api/tags | head -c 200` returns _{"models":[{"name":"qwen2.5-coder:7b",....}_.
- The dev-server is up and running: `./zig-out.bin/zxp server .`
**First example**: render a generative `
` component.
```html
```
Let's "live-serve" this component in a browser. The browser will send a GET request to the dev-servern which. in turn will reach the LLM. Depending upon the mood of the LLML, you can get this image:

**Second example**: interactive generative form
The HTML below is a HTML form where we select a more elaborated prompt. On submission, a JavaScript snippet will POST the prompt to the dev-server "/render_llm" endpoint.
Source:
a FORM textarea INPUT populated by four buttons with a submit button
```html
Interactive prompt (POST → base64)
Table
KPI cards
Progress steps
Invoice
A responsive table with 3 columns: Name, Status, Amount. Include 5 realistic sample rows. Use a blue header.
Generate
const form = document.getElementById('gen-form');
const btn = document.getElementById('gen-btn');
const status = document.getElementById('status');
const result = document.getElementById('result');
// Quick-prompt buttons fill the textarea.
document.querySelectorAll('.quick-prompts button').forEach(b => {
b.addEventListener('click', () => {
form.prompt.value = b.dataset.prompt;
});
});
form.addEventListener('submit', async e => {
e.preventDefault();
const prompt = form.prompt.value.trim();
if (!prompt) return;
btn.disabled = true;
status.className = 'status';
status.textContent = 'Generating… (this may take a few seconds)';
result.style.opacity = '0.4';
try {
const res = await fetch('http://localhost:9984/render_llm', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt,
model: form.model.value,
base_url: form.base_url.value,
width: parseInt(form.width.value, 10) || 800,
format: form.format.value,
}),
});
if (!res.ok) {
const text = await res.text();
throw new Error(`HTTP ${res.status}: ${text}`);
}
const { data, mime } = await res.json();
result.src = `data:${mime};base64,${data}`;
result.style.opacity = '1';
status.textContent = '';
} catch (err) {
status.className = 'status error';
status.textContent = `Error: ${err.message}`;
result.style.opacity = '1';
} finally {
btn.disabled = false;
}
});
```
We have selected to render a table (this is a POST request to "/render_llm"). You can get for example:

#### Note about SSE format
|Provider |Content path |End signal|
|--|--|--|
|OpenAI / groq / mistral / together / ollama_v1 |.choices[0].delta.content |data: [DONE]|
|Anthropic |.delta.text (on content_block_delta events) |event: message_stop|
|Gemini |.candidates[0].content.parts[0].text |.finishReason == "STOP"|
|Ollama |.message.content |no SSE — raw NDJSON|
```txt
if openai || groq || mistral || together || ollama_v1 → .choices[0].delta.content
if anthropic → .delta.text
if gemini → .candidates[0].content.parts[0].text
```
Due to our limitations in the CSS that we are able to render, and because we are using a small model, we had to set hard and explicit constraints in our prompt.
The general system prompt that we use
```zig
pub const default_system =
"You are a UI generator. Output ONLY raw HTML — no markdown, no code fences, no backticks, no explanation. " ++
"Start your response directly with an HTML tag. " ++
"Use ONLY
and elements — NEVER , , , , , . " ++
"Use ONLY inline styles with these CSS properties: display (flex or block), flex-direction, " ++
"justify-content, align-items, flex-wrap, gap, padding, margin, color, background, " ++
"font-size, font-weight, border-radius, width, height, border, text-align, white-space. " ++
"No external fonts. No CSS variables. No animations. No tags. " ++
"CRITICAL: Every opened <div> MUST be explicitly closed with </div> before opening the next sibling <div>. " ++
"CARD PATTERN — row of sibling cards, each card has stacked label+value (NEVER nest cards): " ++
"<div style=\"display:flex;flex-direction:row;gap:16px;padding:16px\">" ++
"<div style=\"width:30%;background:#fff;border-radius:8px;padding:16px\">" ++
"<div style=\"font-size:13px;color:#666\">Label A</div>" ++
"<div style=\"font-size:24px;font-weight:bold\">Value A</div>" ++
"</div>" ++
"<div style=\"width:30%;background:#fff;border-radius:8px;padding:16px\">" ++
"<div style=\"font-size:13px;color:#666\">Label B</div>" ++
"<div style=\"font-size:24px;font-weight:bold\">Value B</div>" ++
"</div>" ++
"</div> " ++
"TABLE PATTERN — outer column container, SIBLING row divs inside it (NEVER nest rows): " ++
"<div style=\"display:flex;flex-direction:column\">" ++
"<div style=\"display:flex;flex-direction:row\">" ++
"<div style=\"width:40%\">Header A</div><div style=\"width:60%\">Header B</div>" ++
"</div>" ++
"<div style=\"display:flex;flex-direction:row\">" ++
"<div style=\"width:40%\">Cell A1</div><div style=\"width:60%\">Cell B1</div>" ++
"</div>" ++
"</div>";
```
</details>
### ECharts
An example that shows how to collect public data from a CSV source and build a [D3.js](https://github.com/d3/d3) chart.
We use the following functions: `zxp.loadHTML()` , `zxp.runScripts()`, `new XMLSerializer()` (serialize the SVG), `zxp.paintSVG()` and `zxp.encode()` (generate WEBP encoded binary) and `zxp.fs.writeFileSync()` to save it locally.
Source: <https://github.com/ndrean/zexplorer/blob/main/src/examples/echarts/echarts_svg.html>
The dev-server is up and running. We send a POST request to the endpoint where the payload is the snippet:
```sh
curl -s -X POST http://localhost:9984/run --data-binary @src/examples/echarts/run_svg.js
```
<img src="https://github.com/ndrean/zexplorer/blob/main/src/examples/echarts/echarts_svg.png" alt="D3 chart from CSV" width="400">
<br>
### Use dynamic HTML with `htm` and paint
<details><summary>We use htm to build dynamic HTML and render it as an image</summary>
[Source](https://github.com/ndrean/zexplorer/blob/main/src/examples/frameworks/htm/teset_html.html")
```html
<html>
<head>
<script>
const { html } = zxp; // embedded in the code
const name = "Zexplorer";
const version = "0.1.0";
const features = ["Lexbor DOM", "QuickJS", "Yoga Layout", "ThorVG"];
const card = html`
<div style=${{
background: "#1a1a2e",
color: "#e0e0e0",
padding: "20px",
}}>
<div style=${{
background: "#16213e",
padding: "10px",
color: "#f7a41d",
}}>
${name} v${version}
</div>
<ul style=${{ padding: "10px" }}>
${features.map(
(f) => html`
<li style=${{
background: "#0f3460",
padding: "5px",
margin: "4px",
color: "#e94560",
}}>${f}</li>
`
)}
</ul>
</div>
`;
document.body.appendChild(card);