An open API service indexing awesome lists of open source software.

https://github.com/imcuttle/flipbook-app

🎨 Flipbook Canvas β€” Click-to-explore knowledge picture-book. Long-press any image to spawn an annotated child diagram, powered by a pluggable multimodal pipeline (text LLM + image gen + web search + OCR) wired to mainstream models (OpenAI / Gemini / Seedream / …). | η‚Ήε‡»εΌζŽ’η΄’ηš„ηŸ₯θ―†η”»ε†ŒοΌšι•ΏζŒ‰ε›Ύη‰‡ε³ε―η”ŸζˆεΈ¦ζ–‡ε­—ζ ‡ζ³¨ηš„ε­ε›ΎοΌŒε€šζ¨‘ζ€ζ΅ζ°΄ηΊΏδΈ²θ”δΈ»ζ΅ε€§ζ¨‘εž‹γ€‚
https://github.com/imcuttle/flipbook-app

Last synced: 7 days ago
JSON representation

🎨 Flipbook Canvas β€” Click-to-explore knowledge picture-book. Long-press any image to spawn an annotated child diagram, powered by a pluggable multimodal pipeline (text LLM + image gen + web search + OCR) wired to mainstream models (OpenAI / Gemini / Seedream / …). | η‚Ήε‡»εΌζŽ’η΄’ηš„ηŸ₯θ―†η”»ε†ŒοΌšι•ΏζŒ‰ε›Ύη‰‡ε³ε―η”ŸζˆεΈ¦ζ–‡ε­—ζ ‡ζ³¨ηš„ε­ε›ΎοΌŒε€šζ¨‘ζ€ζ΅ζ°΄ηΊΏδΈ²θ”δΈ»ζ΅ε€§ζ¨‘εž‹γ€‚

Awesome Lists containing this project

README

          

# 🎨 Flipbook Canvas

**English** Β· [δΈ­ζ–‡](./README.zh.md)

[![Node](https://img.shields.io/badge/Node.js-%E2%89%A520.10-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
[![React](https://img.shields.io/badge/React-18-61DAFB?logo=react&logoColor=white)](https://react.dev/)
[![Vite](https://img.shields.io/badge/Vite-5-646CFF?logo=vite&logoColor=white)](https://vitejs.dev/)
[![Express](https://img.shields.io/badge/Express-4-000000?logo=express&logoColor=white)](https://expressjs.com/)
[![TypeScript](https://img.shields.io/badge/TypeScript-5-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
[![SQLite](https://img.shields.io/badge/SQLite-Sequelize-003B57?logo=sqlite&logoColor=white)](https://www.sqlite.org/)
[![Multimodal](https://img.shields.io/badge/Multimodal-LLM%20%C3%97%20ImageGen%20%C3%97%20WebSearch%20%C3%97%20OCR-FF6F61)](#-multimodal--mainstream-llms)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/imcuttle/flipbook-app/pulls)
[![GitHub stars](https://img.shields.io/github/stars/imcuttle/flipbook-app?style=social)](https://github.com/imcuttle/flipbook-app/stargazers)

### πŸ”­ [**Live examples β†’ imcuttle.github.io/flipbook-app**](https://imcuttle.github.io/flipbook-app/)

> Browse fully-interactive, exported flipbooks right in your browser β€” click hotspots to drill in, no install needed.

> ✨ Click anywhere on a generated image. The backend infers what you clicked,
> searches the web when useful, generates a child diagram, and links it back.
> **A flipbook of explorable knowledge β€” one click at a time.**

> πŸ’‘ Inspired by and a re-implementation of the product idea behind
> [flipbook.page](https://flipbook.page) β€” credit to the original team for the
> click-to-explore canvas concept.

A long-running web product: **Express + SSE** backend, **Vite + React + TS**
frontend, a **pluggable multi-model image pipeline**, web-search augmented
planning, per-node concurrency, read-only share links, fullscreen casting and
a fully responsive mobile layout.

---

## ✨ Why this is fun

Most "AIη”»ε›Ύ" demos stop at one image. This one turns each image into a
**playable knowledge surface**:

- πŸ–±οΈ **Long-press anywhere on a picture** β†’ the model reads what's under your
finger, decides whether the topic needs fresh sources, optionally hits the
web, then paints a brand new annotated diagram zoomed into that concept.
- πŸ“š **Encyclopedia-style output** β€” every node ships with a 150–220-char
caption and 20–40 in-image labels (place names, dates, numbers…), all
OCR'd back into a transparent text layer so you can drag-select and copy
any fragment straight off the picture.
- 🌳 **Infinite tree of canvases** β€” every click spawns a child node; the
whole exploration tree is persisted, shareable, and replayable.
- ⏳ **Watch it think** β€” a node is saved and linkable the instant you click,
then its title / caption / scene prompt **type out live**; share the link
and a friend on another device watches the same stream fill in.

---

## πŸ“Έ Screenshots



Click-to-explore demo

Click-to-explore β€” long-press any region to drill in


Woodpecker walkthrough

End-to-end pipeline β€” search β†’ planner β†’ ImageGen β†’ drill-down




Gallery and canvas

Gallery + canvas β€” every canvas is persisted, shareable, replayable

---

## πŸš€ Highlights

- πŸ–±οΈ **Click-to-explore**: long-press (1 s) anywhere on a node's image. The
backend infers the label, decides whether to web-search, then generates a
child node. Spatial + semantic dedup means clicking the same region again
jumps straight in.
- ⏳ **Live-streaming, linkable generating nodes**: the moment you click, the
child node is **persisted under its final id** and its parent hotspot links
to it immediately β€” so it's **shareable / openable on any device while still
generating**. Its title, caption and image prompt **type out live**
(token-streamed via SSE), the catalog shows a **spinner row**, and a refresh
or cross-device open **resumes the stream from the on-disk snapshot**. On
failure the half-node is auto-deleted.
- 🌫️ **Progressive image loading**: every PNG gets blur β†’ thumbnail β†’ medium β†’
full variants (sharp). Gallery cards blur-up, the canvas swaps to full-res
when ready β€” no broken-image flashes, fast first paint.
- πŸ–ΌοΈ **Portrait & landscape canvases**: pick orientation per canvas (mobile
portrait viewports default to portrait); filter the gallery by
**All / Landscape / Portrait** with the choice synced to the URL.
- ⚑ **Per-node parallelism**: up to **4 different spots in parallel per parent**
(configurable). Each in-flight click streams a phase chip
(`Inferring label…` β†’ `Searching the web…` β†’ `Generating image…`) on the
hotspot. Hit the cap and the cursor turns into βŒ›.
- πŸ“– **Encyclopedia register**: planner produces 150–220 char captions with
20–40 in-image text fragments β€” like reading a richly annotated diagram in
a children's encyclopedia. Long captions clamp to 2 lines with a
**ζŸ₯ηœ‹ζ›΄ε€š / Show more** toggle.
- 🌐 **Web-search augmented**: a "decide-then-search" gate asks the LLM whether
a topic benefits from up-to-date sources. When yes, results are fetched and
fed into the planner; sources are persisted to disk + DB and rendered as a
πŸ“š hover badge over the canvas.
- πŸ” **Resilient SSE**: Last-Event-ID replay + per-job snapshot resume β€” a
dropped connection or page refresh mid-generation reconnects and catches up
on everything it missed, including the in-flight typewriter.
- 🎬 **Scene transitions**: drill-in / drill-out / fade animations make
navigation feel like a zooming flipbook rather than a page swap.
- πŸ”— **Share as preview**: any canvas β†’ read-only `?s=` URL. Viewers can
navigate and watch live SSE updates from in-flight generations, but cannot
trigger new ones.
- πŸ“Ί **Fullscreen casting**: β›Ά requests browser fullscreen; toggle the chrome
(breadcrumb + caption + hint) on/off for a clean projection view.
- πŸ”€ **Selectable in-image text**: every label baked into the diagram is OCR'd
with Apple Vision (`zh-Hans` + `en-US`) and overlaid as invisible HTML, so
users can drag-select and Cmd-C copy any text directly off the picture
while the painted pixels remain the visual ground truth.
- πŸ”Š **Voice narration**: each node's title + caption is synthesised to speech
with **Microsoft Edge neural voices** (msedge-tts β€” free, no API key). Pick a
**character voice** per flipbook from the live Edge catalogue (filtered to the
UI language); the picker reads "ζ™“ζ™“ Β· ε₯³ε£°" instead of raw locale IDs.
Switching voices re-narrates the whole book and restarts in-flight playback.
**Auto-narration is on by default** (toggleable) and is bundled into exports
so the static site speaks offline too.
- πŸ“± **Mobile responsive**: sticky top bar that pins on scroll, single-column
gallery, pinch-zoom image lightbox, smaller hotspots and pending bubbles.

---

## πŸ€– Multimodal Γ— Mainstream LLMs

Flipbook Canvas is built around a **pluggable multimodal pipeline**. Three
modalities are wired end-to-end:

| Modality | What it does | Pluggable into |
|---|---|---|
| πŸ“ **Text / JSON LLM** | planner, click-label inference, decide-then-search verdict | any chat-completion-style model |
| πŸ–ΌοΈ **Image generation** | turns a structured prompt into a 2752Γ—1536 annotated diagram with bake-in text labels | OpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider |
| 🌐 **Web search** | rephrased query β†’ top-N normalized results β†’ planner context + πŸ“š sources panel | any search backend |
| πŸ‘οΈ **OCR (Apple Vision)** | `zh-Hans` + `en-US` recognition over every generated PNG, projected as a selectable HTML overlay | local, no API keys needed |
| πŸ”Š **TTS (Edge neural voices)** | synthesises each node's title + caption to an mp3, per-flipbook character voice | Microsoft Edge online voices via msedge-tts, no API key |

The image layer is a **provider chain** (`IMAGE_PROVIDER=...,svg`) β€” first
enabled provider wins, `svg` is always appended last as a placeholder so the
UI never breaks. Adding a new model is a single file:

```js
// server/src/generation/providers/.js
export default {
name: 'my-model',
enabled(config) { return Boolean(config.MY_API_KEY); },
async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
// call your model, write .png into outputDir, push phase events
},
};
```

Out of the box:

| Provider | Trigger to enable | Status |
|---|---|---|
| `openai` | `OPENAI_API_KEY` set | πŸ”Œ stub β€” implement in `providers/openai.js` |
| `nanobanana` | `NANOBANANA_API_KEY` or `GEMINI_API_KEY` | πŸ”Œ stub |
| `seeddance` | `SEEDDANCE_API_KEY` or `ARK_API_KEY` | πŸ”Œ stub |
| `codebuddy` | `ENABLE_CODEBUDDY=1` | βœ… reference impl (used in the demo gif) |
| `svg` | always | βœ… fallback placeholder |

> 🎯 The **reference implementation** wires the `codebuddy` CLI as a
> subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle
> (concurrency cap, per-call timeouts, single retry, file-size sanity check on
> generated PNGs, graceful degradation) lives in `server/src/codebuddyClient.js`
> and is a useful template if you ever shell out to *any* CLI-based model.

---

## 🐦 Walkthrough β€” generating a woodpecker flipbook from zero

Type `ε•„ζœ¨ιΈŸ` (woodpecker) into the top bar and watch the entire pipeline run:
decide-then-search β†’ planner β†’ ImageGen β†’ click to drill into the tongue
anatomy / nest cavity / ant-foraging zones, each spawning its own annotated
diagram with its own sources.

---

## πŸ—‚οΈ Layout

```
.
β”œβ”€β”€ prompts/ # system / planner / click-label / image-prompt / decide-search
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ sync-prompts.mjs
β”‚ β”œβ”€β”€ serve-preview.mjs # build + serve one canvas's static preview
β”‚ └── example-doc-publish.mjs # publish canvases to GitHub Pages
β”œβ”€β”€ server/
β”‚ └── src/
β”‚ β”œβ”€β”€ routes/ # canvas, click, events (SSE), assets, share
β”‚ β”œβ”€β”€ export/ # static-site exporter + viewer template
β”‚ β”‚ β”œβ”€β”€ buildExport.js # buildCanvasSite / buildCanvasExport (zip)
β”‚ β”‚ └── template/ # self-contained index.html + viewer.js/css
β”‚ β”œβ”€β”€ lib/zip.js # dependency-free ZIP writer
β”‚ β”œβ”€β”€ generation/
β”‚ β”‚ β”œβ”€β”€ pipeline.js # generateRoot + expandFromClick + per-node concurrency
β”‚ β”‚ β”œβ”€β”€ decideSearch.js # decide-then-search gate
β”‚ β”‚ β”œβ”€β”€ webSearch.js # WebSearch subprocess + result normaliser
β”‚ β”‚ β”œβ”€β”€ queue.js # PerCanvasQueue / Semaphore / PerKeySemaphore
β”‚ β”‚ β”œβ”€β”€ planner.js / clickLabel.js
β”‚ β”‚ β”œβ”€β”€ image.js # provider-chain orchestrator
β”‚ β”‚ └── providers/ # codebuddy, openai, nanobanana, seeddance, svg
β”‚ β”œβ”€β”€ db/ # Sequelize models + hydrateFromDisk
β”‚ β”œβ”€β”€ store/ # filesystem layer
β”‚ β”œβ”€β”€ sse/ # event hub
β”‚ └── codebuddyClient.js # reference CLI-subprocess wrapper
└── web/ # Vite + React + TS
```

## πŸ’Ύ Storage

- πŸ“ **Filesystem** (source of truth for big artifacts):
`server/data/canvases//{data/tree.json, data/nodes/.json, images/.{png,svg}, manifest.json}`.
- πŸ—ƒοΈ **SQLite** (`server/data/flipbook.sqlite`, via Sequelize): metadata index β€”
Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the
gallery, spatial dedup, share lookup, and sources hover panel. On boot the
server runs `hydrateFromDisk()` to rebuild this index if it's missing.

## πŸ› οΈ Develop

```bash
npm install
npm run dev # server on :8787 + Vite on :5173 in parallel
```

Open http://127.0.0.1:5173.

By default `ENABLE_CODEBUDDY=0` (stub mode β€” fast, SVG placeholders, no LLM).
Set `ENABLE_CODEBUDDY=1` to use the reference CLI provider for planner +
ImageGen + WebSearch:

```bash
ENABLE_CODEBUDDY=1 npm run dev:server
```

> ⏱️ With the reference provider, each node takes ~70–95 s end-to-end (planner
> ~25 s + ImageGen ~50–60 s including cold start; +5–15 s if web search runs).
> ImageGen produces **2752Γ—1536 PNG** (~6 MB).

### Per-node parallelism

Up to **4 click expansions per parent node** run in parallel; excess clicks
queue. Different parents and different canvases run independently. A
per-parent write lock serializes only the short read-modify-write of the
parent node JSON. Tunable via `MAX_PARALLEL_CLICKS_PER_NODE` (default 4).

## πŸ” Web search

A pre-planner gate (`decideSearch.js` + `prompts/decide-search.md`) calls the
LLM with the proposed subject and asks: do recent / authoritative sources
materially improve this node? The default leans **yes** β€” only clearly
abstract / timeless subjects skip search. When yes:

1. The web-search backend runs with the rephrased query.
2. Results are normalised into `{title, url, snippet, source}`.
3. Top results are passed into the planner prompt.
4. Sources are persisted both into `nodes/.json` and into the SQLite
`Sources` table.
5. The frontend renders a πŸ“š badge near the breadcrumb. Hover to see a popover
with the source list (220 ms grace period so the popover is reachable with
the mouse).

## πŸ“¦ Export as a standalone static site

Any canvas can be exported as a **fully self-contained static site** β€” a
read-only replica of the preview with all data and images inlined, openable
directly from `file://` with zero network requests.

- **In-app**: the `Β·Β·Β·` More menu β†’ **Export preview** downloads a `.zip`
(`index.html` / `viewer.js` / `viewer.css` / `data.js` + `images/`).
- **Serve one locally** for quick viewing in a browser:

```bash
npm run serve-preview -- [--lang en] [--port 8088]
```

Builds the static site to a temp dir, starts a tiny static HTTP server,
prints the URL. Ctrl-C cleans up.

- **Publish to GitHub Pages** (one or more canvases β†’ a routed gallery landing
page at `/`, each example at `//`):

```bash
npm run example:publish -- [ ...] [--lang en] [--no-push]
```

Builds each canvas, regenerates the landing index, and pushes to the
`gh-pages` branch (accumulating β€” re-publishing a new id keeps the others).
β†’ see the result at **https://imcuttle.github.io/flipbook-app/**.

The exported viewer mirrors the live read-only preview: image stage with
collision-avoiding hotspot labels, leader lines, selectable OCR text overlay,
caption, breadcrumb, catalog and sources β€” plus progressive image loading,
scene transitions, and next-layer image prefetch. **Per-node narration mp3s are
bundled too**, so the static site auto-narrates offline (toggleable in the top
bar). It never calls the server.

## πŸ”— Share / preview links

- `POST /api/canvas/:id/share` β†’ `{token, url}`. Reuses an existing token for
the same canvas.
- `GET /api/share/:token` β†’ `{canvasId, topic, readOnly:true}`.
- Frontend: opening `…?s=` puts the UI in **read-only preview** mode β€”
no topic input, no clicks on the image, "πŸ‘ Preview" badge in the corner.
SSE stays connected, so a viewer watching mid-generation sees images stream
in real-time.

## πŸ“Ί Fullscreen / casting

- `β›Ά` button in TopBar requests browser fullscreen; uses CSS-only fullscreen
on iOS Safari where the API isn't supported.
- `πŸ‘` / `🚫` button (visible while in fullscreen) toggles the breadcrumb +
caption + hint. Useful for clean projection.
- Long-press hint is suppressed in fullscreen by default; the press still
works.

## 🧹 Cleaning local state

```bash
npm run clean:data # reset server/data (all canvases)
npm run clean:dist # reset web/dist
npm run clean # both
```

## πŸ“¦ Build for production

```bash
npm run build # builds web/dist
npm start # serves web/dist + API from :8787
```

## 🌐 LAN access via a fixed domain (macOS)

Give the app a stable hostname (e.g. `http://flipbook.lan`) reachable from any
device on your LAN β€” no port number needed. Uses **dnsmasq** (resolves the
domain β†’ this machine's LAN IP) + **Caddy** (reverse-proxies `:80` to the app).

```bash
npm run lan:up # flipbook.lan β†’ dev :5173 (preferred), falls back to prod :8787
npm run lan:down # tear it down

# custom: scripts/lan-domain-setup.sh
bash scripts/lan-domain-setup.sh studio.lan 5173 8787
```

The proxy tries the **dev** port (5173) first and automatically **falls back to
the prod** port (8787) when dev isn't running (passive health check, 3s
blacklist). So `npm run dev` and `npm start` both work behind the same domain.

`lan:up` installs dnsmasq/caddy via Homebrew if missing and needs `sudo`
(dnsmasq binds 53, Caddy binds 80). It only configures **this** machine; to
reach the domain from other devices, point their DNS at this machine's LAN IP
(router DHCP DNS, per-device DNS, or a `hosts` entry β€” the script prints the
exact options and your IP).

## βš™οΈ Configuration (env)

| Var | Default | Purpose |
|---|---|---|
| `PORT` | 8787 | server port |
| `HOST` | 127.0.0.1 | server bind |
| `DATA_DIR` | `server/data` | canvas state on disk |
| `PROMPTS_DIR` | `prompts` | prompt files |
| `DB_PATH` | `/flipbook.sqlite` | SQLite file |
| `MAX_PARALLEL_CLICKS_PER_NODE` | 4 | concurrent click expansions per parent |
| `MAX_PARALLEL_CODEBUDDY` | 20 | concurrent planner/LLM subprocesses |
| `MAX_PARALLEL_IMAGE` | 20 | concurrent image-generation jobs (separate pool from the LLM limit) |
| `PLANNER_TIMEOUT_MS` | 90000 | per-call planner timeout |
| `IMAGE_TIMEOUT_MS` | 180000 | per-call ImageGen timeout |
| `WEB_SEARCH_TIMEOUT_MS` | 60000 | per-call WebSearch timeout |
| `IMAGE_PROVIDER` | `codebuddy` | provider chain (e.g. `openai,nanobanana,svg`) |
| `IMAGE_SIZE` | `1920x1080` | requested size (provider may pick its own) |
| `ENABLE_CODEBUDDY` | 0 | flip to 1 to enable the reference CLI provider |
| `ENABLE_WEB_SEARCH` | follows `ENABLE_CODEBUDDY` | force-disable with `0` |
| `ENABLE_OCR` | 1 | run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to `0` to skip |
| `OCR_TIMEOUT_MS` | 25000 | per-call OCR timeout |
| `OCR_MIN_CONFIDENCE` | 0.4 | drop OCR spans below this confidence |
| `ENABLE_AUDIO` | 1 | synthesise Edge neural-voice narration (mp3) for each node; set to `0` to skip. Non-blocking β€” failures never stop image generation |
| `AUDIO_TIMEOUT_MS` | 30000 | per-call TTS synthesis timeout |

---

**English** Β· [δΈ­ζ–‡](./README.zh.md)