https://github.com/imcuttle/flipbook-app
π¨ Flipbook Canvas β Click-to-explore knowledge picture-book. Long-press any image to spawn an annotated child diagram, powered by a pluggable multimodal pipeline (text LLM + image gen + web search + OCR) wired to mainstream models (OpenAI / Gemini / Seedream / β¦). | ηΉε»εΌζ’η΄’ηη₯θ―η»εοΌιΏζεΎηε³ε―ηζεΈ¦ζεζ 注ηεεΎοΌε€ζ¨‘ζζ΅ζ°΄ηΊΏδΈ²θδΈ»ζ΅ε€§ζ¨‘εγ
https://github.com/imcuttle/flipbook-app
Last synced: 7 days ago
JSON representation
π¨ Flipbook Canvas β Click-to-explore knowledge picture-book. Long-press any image to spawn an annotated child diagram, powered by a pluggable multimodal pipeline (text LLM + image gen + web search + OCR) wired to mainstream models (OpenAI / Gemini / Seedream / β¦). | ηΉε»εΌζ’η΄’ηη₯θ―η»εοΌιΏζεΎηε³ε―ηζεΈ¦ζεζ 注ηεεΎοΌε€ζ¨‘ζζ΅ζ°΄ηΊΏδΈ²θδΈ»ζ΅ε€§ζ¨‘εγ
- Host: GitHub
- URL: https://github.com/imcuttle/flipbook-app
- Owner: imcuttle
- Created: 2026-05-28T07:52:00.000Z (24 days ago)
- Default Branch: main
- Last Pushed: 2026-05-29T12:18:50.000Z (22 days ago)
- Last Synced: 2026-05-29T14:08:05.644Z (22 days ago)
- Language: JavaScript
- Size: 16.7 MB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π¨ Flipbook Canvas
**English** Β· [δΈζ](./README.zh.md)
[](https://nodejs.org/)
[](https://react.dev/)
[](https://vitejs.dev/)
[](https://expressjs.com/)
[](https://www.typescriptlang.org/)
[](https://www.sqlite.org/)
[](#-multimodal--mainstream-llms)
[](https://github.com/imcuttle/flipbook-app/pulls)
[](https://github.com/imcuttle/flipbook-app/stargazers)
### π [**Live examples β imcuttle.github.io/flipbook-app**](https://imcuttle.github.io/flipbook-app/)
> Browse fully-interactive, exported flipbooks right in your browser β click hotspots to drill in, no install needed.
> β¨ Click anywhere on a generated image. The backend infers what you clicked,
> searches the web when useful, generates a child diagram, and links it back.
> **A flipbook of explorable knowledge β one click at a time.**
> π‘ Inspired by and a re-implementation of the product idea behind
> [flipbook.page](https://flipbook.page) β credit to the original team for the
> click-to-explore canvas concept.
A long-running web product: **Express + SSE** backend, **Vite + React + TS**
frontend, a **pluggable multi-model image pipeline**, web-search augmented
planning, per-node concurrency, read-only share links, fullscreen casting and
a fully responsive mobile layout.
---
## β¨ Why this is fun
Most "AIη»εΎ" demos stop at one image. This one turns each image into a
**playable knowledge surface**:
- π±οΈ **Long-press anywhere on a picture** β the model reads what's under your
finger, decides whether the topic needs fresh sources, optionally hits the
web, then paints a brand new annotated diagram zoomed into that concept.
- π **Encyclopedia-style output** β every node ships with a 150β220-char
caption and 20β40 in-image labels (place names, dates, numbersβ¦), all
OCR'd back into a transparent text layer so you can drag-select and copy
any fragment straight off the picture.
- π³ **Infinite tree of canvases** β every click spawns a child node; the
whole exploration tree is persisted, shareable, and replayable.
- β³ **Watch it think** β a node is saved and linkable the instant you click,
then its title / caption / scene prompt **type out live**; share the link
and a friend on another device watches the same stream fill in.
---
## πΈ Screenshots
Click-to-explore β long-press any region to drill in
End-to-end pipeline β search β planner β ImageGen β drill-down
Gallery + canvas β every canvas is persisted, shareable, replayable
---
## π Highlights
- π±οΈ **Click-to-explore**: long-press (1 s) anywhere on a node's image. The
backend infers the label, decides whether to web-search, then generates a
child node. Spatial + semantic dedup means clicking the same region again
jumps straight in.
- β³ **Live-streaming, linkable generating nodes**: the moment you click, the
child node is **persisted under its final id** and its parent hotspot links
to it immediately β so it's **shareable / openable on any device while still
generating**. Its title, caption and image prompt **type out live**
(token-streamed via SSE), the catalog shows a **spinner row**, and a refresh
or cross-device open **resumes the stream from the on-disk snapshot**. On
failure the half-node is auto-deleted.
- π«οΈ **Progressive image loading**: every PNG gets blur β thumbnail β medium β
full variants (sharp). Gallery cards blur-up, the canvas swaps to full-res
when ready β no broken-image flashes, fast first paint.
- πΌοΈ **Portrait & landscape canvases**: pick orientation per canvas (mobile
portrait viewports default to portrait); filter the gallery by
**All / Landscape / Portrait** with the choice synced to the URL.
- β‘ **Per-node parallelism**: up to **4 different spots in parallel per parent**
(configurable). Each in-flight click streams a phase chip
(`Inferring labelβ¦` β `Searching the webβ¦` β `Generating imageβ¦`) on the
hotspot. Hit the cap and the cursor turns into β.
- π **Encyclopedia register**: planner produces 150β220 char captions with
20β40 in-image text fragments β like reading a richly annotated diagram in
a children's encyclopedia. Long captions clamp to 2 lines with a
**ζ₯ηζ΄ε€ / Show more** toggle.
- π **Web-search augmented**: a "decide-then-search" gate asks the LLM whether
a topic benefits from up-to-date sources. When yes, results are fetched and
fed into the planner; sources are persisted to disk + DB and rendered as a
π hover badge over the canvas.
- π **Resilient SSE**: Last-Event-ID replay + per-job snapshot resume β a
dropped connection or page refresh mid-generation reconnects and catches up
on everything it missed, including the in-flight typewriter.
- π¬ **Scene transitions**: drill-in / drill-out / fade animations make
navigation feel like a zooming flipbook rather than a page swap.
- π **Share as preview**: any canvas β read-only `?s=` URL. Viewers can
navigate and watch live SSE updates from in-flight generations, but cannot
trigger new ones.
- πΊ **Fullscreen casting**: βΆ requests browser fullscreen; toggle the chrome
(breadcrumb + caption + hint) on/off for a clean projection view.
- π€ **Selectable in-image text**: every label baked into the diagram is OCR'd
with Apple Vision (`zh-Hans` + `en-US`) and overlaid as invisible HTML, so
users can drag-select and Cmd-C copy any text directly off the picture
while the painted pixels remain the visual ground truth.
- π **Voice narration**: each node's title + caption is synthesised to speech
with **Microsoft Edge neural voices** (msedge-tts β free, no API key). Pick a
**character voice** per flipbook from the live Edge catalogue (filtered to the
UI language); the picker reads "ζζ Β· ε₯³ε£°" instead of raw locale IDs.
Switching voices re-narrates the whole book and restarts in-flight playback.
**Auto-narration is on by default** (toggleable) and is bundled into exports
so the static site speaks offline too.
- π± **Mobile responsive**: sticky top bar that pins on scroll, single-column
gallery, pinch-zoom image lightbox, smaller hotspots and pending bubbles.
---
## π€ Multimodal Γ Mainstream LLMs
Flipbook Canvas is built around a **pluggable multimodal pipeline**. Three
modalities are wired end-to-end:
| Modality | What it does | Pluggable into |
|---|---|---|
| π **Text / JSON LLM** | planner, click-label inference, decide-then-search verdict | any chat-completion-style model |
| πΌοΈ **Image generation** | turns a structured prompt into a 2752Γ1536 annotated diagram with bake-in text labels | OpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider |
| π **Web search** | rephrased query β top-N normalized results β planner context + π sources panel | any search backend |
| ποΈ **OCR (Apple Vision)** | `zh-Hans` + `en-US` recognition over every generated PNG, projected as a selectable HTML overlay | local, no API keys needed |
| π **TTS (Edge neural voices)** | synthesises each node's title + caption to an mp3, per-flipbook character voice | Microsoft Edge online voices via msedge-tts, no API key |
The image layer is a **provider chain** (`IMAGE_PROVIDER=...,svg`) β first
enabled provider wins, `svg` is always appended last as a placeholder so the
UI never breaks. Adding a new model is a single file:
```js
// server/src/generation/providers/.js
export default {
name: 'my-model',
enabled(config) { return Boolean(config.MY_API_KEY); },
async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
// call your model, write .png into outputDir, push phase events
},
};
```
Out of the box:
| Provider | Trigger to enable | Status |
|---|---|---|
| `openai` | `OPENAI_API_KEY` set | π stub β implement in `providers/openai.js` |
| `nanobanana` | `NANOBANANA_API_KEY` or `GEMINI_API_KEY` | π stub |
| `seeddance` | `SEEDDANCE_API_KEY` or `ARK_API_KEY` | π stub |
| `codebuddy` | `ENABLE_CODEBUDDY=1` | β
reference impl (used in the demo gif) |
| `svg` | always | β
fallback placeholder |
> π― The **reference implementation** wires the `codebuddy` CLI as a
> subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle
> (concurrency cap, per-call timeouts, single retry, file-size sanity check on
> generated PNGs, graceful degradation) lives in `server/src/codebuddyClient.js`
> and is a useful template if you ever shell out to *any* CLI-based model.
---
## π¦ Walkthrough β generating a woodpecker flipbook from zero
Type `εζ¨ιΈ` (woodpecker) into the top bar and watch the entire pipeline run:
decide-then-search β planner β ImageGen β click to drill into the tongue
anatomy / nest cavity / ant-foraging zones, each spawning its own annotated
diagram with its own sources.
---
## ποΈ Layout
```
.
βββ prompts/ # system / planner / click-label / image-prompt / decide-search
βββ scripts/
β βββ sync-prompts.mjs
β βββ serve-preview.mjs # build + serve one canvas's static preview
β βββ example-doc-publish.mjs # publish canvases to GitHub Pages
βββ server/
β βββ src/
β βββ routes/ # canvas, click, events (SSE), assets, share
β βββ export/ # static-site exporter + viewer template
β β βββ buildExport.js # buildCanvasSite / buildCanvasExport (zip)
β β βββ template/ # self-contained index.html + viewer.js/css
β βββ lib/zip.js # dependency-free ZIP writer
β βββ generation/
β β βββ pipeline.js # generateRoot + expandFromClick + per-node concurrency
β β βββ decideSearch.js # decide-then-search gate
β β βββ webSearch.js # WebSearch subprocess + result normaliser
β β βββ queue.js # PerCanvasQueue / Semaphore / PerKeySemaphore
β β βββ planner.js / clickLabel.js
β β βββ image.js # provider-chain orchestrator
β β βββ providers/ # codebuddy, openai, nanobanana, seeddance, svg
β βββ db/ # Sequelize models + hydrateFromDisk
β βββ store/ # filesystem layer
β βββ sse/ # event hub
β βββ codebuddyClient.js # reference CLI-subprocess wrapper
βββ web/ # Vite + React + TS
```
## πΎ Storage
- π **Filesystem** (source of truth for big artifacts):
`server/data/canvases//{data/tree.json, data/nodes/.json, images/.{png,svg}, manifest.json}`.
- ποΈ **SQLite** (`server/data/flipbook.sqlite`, via Sequelize): metadata index β
Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the
gallery, spatial dedup, share lookup, and sources hover panel. On boot the
server runs `hydrateFromDisk()` to rebuild this index if it's missing.
## π οΈ Develop
```bash
npm install
npm run dev # server on :8787 + Vite on :5173 in parallel
```
Open http://127.0.0.1:5173.
By default `ENABLE_CODEBUDDY=0` (stub mode β fast, SVG placeholders, no LLM).
Set `ENABLE_CODEBUDDY=1` to use the reference CLI provider for planner +
ImageGen + WebSearch:
```bash
ENABLE_CODEBUDDY=1 npm run dev:server
```
> β±οΈ With the reference provider, each node takes ~70β95 s end-to-end (planner
> ~25 s + ImageGen ~50β60 s including cold start; +5β15 s if web search runs).
> ImageGen produces **2752Γ1536 PNG** (~6 MB).
### Per-node parallelism
Up to **4 click expansions per parent node** run in parallel; excess clicks
queue. Different parents and different canvases run independently. A
per-parent write lock serializes only the short read-modify-write of the
parent node JSON. Tunable via `MAX_PARALLEL_CLICKS_PER_NODE` (default 4).
## π Web search
A pre-planner gate (`decideSearch.js` + `prompts/decide-search.md`) calls the
LLM with the proposed subject and asks: do recent / authoritative sources
materially improve this node? The default leans **yes** β only clearly
abstract / timeless subjects skip search. When yes:
1. The web-search backend runs with the rephrased query.
2. Results are normalised into `{title, url, snippet, source}`.
3. Top results are passed into the planner prompt.
4. Sources are persisted both into `nodes/.json` and into the SQLite
`Sources` table.
5. The frontend renders a π badge near the breadcrumb. Hover to see a popover
with the source list (220 ms grace period so the popover is reachable with
the mouse).
## π¦ Export as a standalone static site
Any canvas can be exported as a **fully self-contained static site** β a
read-only replica of the preview with all data and images inlined, openable
directly from `file://` with zero network requests.
- **In-app**: the `Β·Β·Β·` More menu β **Export preview** downloads a `.zip`
(`index.html` / `viewer.js` / `viewer.css` / `data.js` + `images/`).
- **Serve one locally** for quick viewing in a browser:
```bash
npm run serve-preview -- [--lang en] [--port 8088]
```
Builds the static site to a temp dir, starts a tiny static HTTP server,
prints the URL. Ctrl-C cleans up.
- **Publish to GitHub Pages** (one or more canvases β a routed gallery landing
page at `/`, each example at `//`):
```bash
npm run example:publish -- [ ...] [--lang en] [--no-push]
```
Builds each canvas, regenerates the landing index, and pushes to the
`gh-pages` branch (accumulating β re-publishing a new id keeps the others).
β see the result at **https://imcuttle.github.io/flipbook-app/**.
The exported viewer mirrors the live read-only preview: image stage with
collision-avoiding hotspot labels, leader lines, selectable OCR text overlay,
caption, breadcrumb, catalog and sources β plus progressive image loading,
scene transitions, and next-layer image prefetch. **Per-node narration mp3s are
bundled too**, so the static site auto-narrates offline (toggleable in the top
bar). It never calls the server.
## π Share / preview links
- `POST /api/canvas/:id/share` β `{token, url}`. Reuses an existing token for
the same canvas.
- `GET /api/share/:token` β `{canvasId, topic, readOnly:true}`.
- Frontend: opening `β¦?s=` puts the UI in **read-only preview** mode β
no topic input, no clicks on the image, "π Preview" badge in the corner.
SSE stays connected, so a viewer watching mid-generation sees images stream
in real-time.
## πΊ Fullscreen / casting
- `βΆ` button in TopBar requests browser fullscreen; uses CSS-only fullscreen
on iOS Safari where the API isn't supported.
- `π` / `π«` button (visible while in fullscreen) toggles the breadcrumb +
caption + hint. Useful for clean projection.
- Long-press hint is suppressed in fullscreen by default; the press still
works.
## π§Ή Cleaning local state
```bash
npm run clean:data # reset server/data (all canvases)
npm run clean:dist # reset web/dist
npm run clean # both
```
## π¦ Build for production
```bash
npm run build # builds web/dist
npm start # serves web/dist + API from :8787
```
## π LAN access via a fixed domain (macOS)
Give the app a stable hostname (e.g. `http://flipbook.lan`) reachable from any
device on your LAN β no port number needed. Uses **dnsmasq** (resolves the
domain β this machine's LAN IP) + **Caddy** (reverse-proxies `:80` to the app).
```bash
npm run lan:up # flipbook.lan β dev :5173 (preferred), falls back to prod :8787
npm run lan:down # tear it down
# custom: scripts/lan-domain-setup.sh
bash scripts/lan-domain-setup.sh studio.lan 5173 8787
```
The proxy tries the **dev** port (5173) first and automatically **falls back to
the prod** port (8787) when dev isn't running (passive health check, 3s
blacklist). So `npm run dev` and `npm start` both work behind the same domain.
`lan:up` installs dnsmasq/caddy via Homebrew if missing and needs `sudo`
(dnsmasq binds 53, Caddy binds 80). It only configures **this** machine; to
reach the domain from other devices, point their DNS at this machine's LAN IP
(router DHCP DNS, per-device DNS, or a `hosts` entry β the script prints the
exact options and your IP).
## βοΈ Configuration (env)
| Var | Default | Purpose |
|---|---|---|
| `PORT` | 8787 | server port |
| `HOST` | 127.0.0.1 | server bind |
| `DATA_DIR` | `server/data` | canvas state on disk |
| `PROMPTS_DIR` | `prompts` | prompt files |
| `DB_PATH` | `/flipbook.sqlite` | SQLite file |
| `MAX_PARALLEL_CLICKS_PER_NODE` | 4 | concurrent click expansions per parent |
| `MAX_PARALLEL_CODEBUDDY` | 20 | concurrent planner/LLM subprocesses |
| `MAX_PARALLEL_IMAGE` | 20 | concurrent image-generation jobs (separate pool from the LLM limit) |
| `PLANNER_TIMEOUT_MS` | 90000 | per-call planner timeout |
| `IMAGE_TIMEOUT_MS` | 180000 | per-call ImageGen timeout |
| `WEB_SEARCH_TIMEOUT_MS` | 60000 | per-call WebSearch timeout |
| `IMAGE_PROVIDER` | `codebuddy` | provider chain (e.g. `openai,nanobanana,svg`) |
| `IMAGE_SIZE` | `1920x1080` | requested size (provider may pick its own) |
| `ENABLE_CODEBUDDY` | 0 | flip to 1 to enable the reference CLI provider |
| `ENABLE_WEB_SEARCH` | follows `ENABLE_CODEBUDDY` | force-disable with `0` |
| `ENABLE_OCR` | 1 | run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to `0` to skip |
| `OCR_TIMEOUT_MS` | 25000 | per-call OCR timeout |
| `OCR_MIN_CONFIDENCE` | 0.4 | drop OCR spans below this confidence |
| `ENABLE_AUDIO` | 1 | synthesise Edge neural-voice narration (mp3) for each node; set to `0` to skip. Non-blocking β failures never stop image generation |
| `AUDIO_TIMEOUT_MS` | 30000 | per-call TTS synthesis timeout |
---
**English** Β· [δΈζ](./README.zh.md)