{"id":48749921,"url":"https://github.com/Yrzhe/pagefly","last_synced_at":"2026-04-28T14:00:59.330Z","repository":{"id":350182205,"uuid":"1203609535","full_name":"Yrzhe/pagefly","owner":"Yrzhe","description":"Personal Knowledge OS — Capture → Distill → Compile → Serve. Self-hosted knowledge data platform with AI agents, Telegram bot, and REST API.","archived":false,"fork":false,"pushed_at":"2026-04-20T03:57:04.000Z","size":20104,"stargazers_count":49,"open_issues_count":0,"forks_count":5,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-20T05:34:21.131Z","etag":null,"topics":["ai-agents","claude","fastapi","knowledge-graph","knowledge-management","obsidian","personal-knowledge-base","react","self-hosted","telegram-bot"],"latest_commit_sha":null,"homepage":"https://pagefly.ink","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yrzhe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-07T07:41:30.000Z","updated_at":"2026-04-20T03:57:08.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Yrzhe/pagefly","commit_stats":null,"previous_names":["yrzhe/pagefly"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Yrzhe/pagefly","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yrzhe%2Fpagefly","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yrzhe%2Fpagefly/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yrzhe%2Fpagefly/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yrzhe%2Fpagefly/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yrzhe","download_url":"https://codeload.github.com/Yrzhe/pagefly/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yrzhe%2Fpagefly/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32383791,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T11:25:28.583Z","status":"ssl_error","status_checked_at":"2026-04-28T11:25:05.435Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","claude","fastapi","knowledge-graph","knowledge-management","obsidian","personal-knowledge-base","react","self-hosted","telegram-bot"],"created_at":"2026-04-12T17:00:30.689Z","updated_at":"2026-04-28T14:00:59.324Z","avatar_url":"https://github.com/Yrzhe.png","language":"Python","funding_links":[],"categories":["未分类"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"docs/assets/readme/OG Image.png\" alt=\"PageFly — Personal Knowledge OS\" width=\"720\" /\u003e\n\n# PageFly\n\n[![MIT License](https://img.shields.io/badge/license-MIT-F59E0B?style=flat-square)](LICENSE)\n[![Python](https://img.shields.io/badge/python-3.11+-3776AB?style=flat-square\u0026logo=python\u0026logoColor=white)](https://python.org)\n[![React](https://img.shields.io/badge/react-19-61DAFB?style=flat-square\u0026logo=react\u0026logoColor=white)](https://react.dev)\n[![Docker](https://img.shields.io/badge/docker-ready-2496ED?style=flat-square\u0026logo=docker\u0026logoColor=white)](https://docker.com)\n\n[Live Demo](https://pagefly.ink) · [The Story](#the-story) · [Quick Start](#quick-start) · [中文](README_CN.md)\n\n\u003c/div\u003e\n\n---\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/assets/readme/idea.png\" alt=\"PageFly Concept\" width=\"720\" /\u003e\n  \u003cbr /\u003e\n  \u003csub\u003eThe idea: a knowledge flywheel that grows structured knowledge from the stream of daily life.\u003c/sub\u003e\n\u003c/div\u003e\n\n---\n\n## What is PageFly?\n\nPageFly is a **self-hosted, private knowledge data platform** — a structured, automated, API-ready knowledge governance system with warm, opinionated architecture.\n\nYou send it raw material (PDFs, markdown, images, voice memos, URLs, Telegram messages), and it:\n\n1. **Captures** — ingests into a structured raw layer with metadata\n2. **Distills** — AI classifies, scores relevance, tags temporal type, extracts key claims\n3. **Compiles** — agents write and maintain wiki articles (concept pages, summaries, connection maps)\n4. **Serves** — REST API, Telegram bot, Obsidian-compatible markdown output\n\nYou never write the wiki manually — the LLM owns it.\n\n## The Story\n\nPageFly was inspired by [Andrej Karpathy's LLMWiki](https://x.com/karpathy/status/1039944530988847617) — the idea that structured knowledge compilation can be automated.\n\nI saw the tweet and thought: what if we took this further? Not just a wiki, but a complete **capture-to-serve pipeline** with ingestion, distillation, governance, and API access.\n\n\u003cdiv align=\"center\"\u003e\n\n**[See my reply to Karpathy →](https://x.com/yrzhe_top/status/2039944530988847617)**\n\n\u003c/div\u003e\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                        Channels                              │\n│  Telegram Bot  ·  REST API  ·  Web Frontend  ·  Scheduler   │\n└─────────────┬───────────────────────────────────┬───────────┘\n              │                                   │\n   ┌──────────▼──────────┐           ┌────────────▼───────────┐\n   │   Ingest Pipeline   │           │    Agent System         │\n   │                     │           │                         │\n   │  PDF · DOCX · Image │           │  Compiler (wiki write)  │\n   │  Voice · URL · Text │           │  Query (search + chat)  │\n   │                     │           │  Review (lint + audit)   │\n   └──────────┬──────────┘           └────────────┬───────────┘\n              │                                   │\n   ┌──────────▼──────────┐           ┌────────────▼───────────┐\n   │   Governance        │           │    Storage              │\n   │                     │           │                         │\n   │  Classifier (AI)    │           │  SQLite (metadata)      │\n   │  Organizer          │           │  Filesystem (documents) │\n   │  Integrity Checker  │           │  Wiki (markdown)        │\n   └─────────────────────┘           └─────────────────────────┘\n```\n\n## Features\n\n| Feature | Description |\n|---------|-------------|\n| **Multi-format Ingestion** | PDF, DOCX, images (OCR), voice (transcription), URLs, plain text |\n| **AI Distillation** | Auto-classification, relevance scoring (1-10), temporal tagging, key claim extraction |\n| **Wiki Compilation** | Agents write concept pages, summaries, and connection maps with update-first governance |\n| **Telegram Bot** | Send anything via Telegram — text, photos, voice, documents. Inline approval flow |\n| **REST API** | Full API with multi-token auth (master + scoped client tokens) |\n| **Obsidian-Compatible** | Wiki output as flat `.md` files with YAML frontmatter — drop into any PKM tool |\n\n## Quick Start\n\n### Option A — One-click deploy (Railway)\n\n[![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/new/template?template=https%3A%2F%2Fgithub.com%2FYrzhe%2Fpagefly)\n\nSet `ANTHROPIC_API_KEY`, `PAGEFLY_EMAIL`, and `PAGEFLY_PASSWORD` in the Railway dashboard. Add a volume mounted at `/app/data` for persistence.\n\n### Option B — Docker (local / self-host)\n\n**Prerequisites**: Docker + Docker Compose, an [Anthropic API key](https://console.anthropic.com/).\n\n```bash\ngit clone https://github.com/Yrzhe/pagefly.git\ncd pagefly\npython -m src.cli setup      # interactive: email, password, API keys, demo data\ndocker compose up -d\n```\n\nThe `setup` command generates a valid `config.json` with a hashed password and — if you accept — seeds a working demo knowledge base so you can see the system in action before adding your own documents.\n\n**Already configured?** Skip `setup` and just `docker compose up -d`.\n\n### Option C — Minimal env-only boot\n\nNo `config.json` needed. Export three env vars and launch:\n\n```bash\nexport ANTHROPIC_API_KEY=sk-ant-...\nexport PAGEFLY_EMAIL=you@example.com\nexport PAGEFLY_PASSWORD=your-password\ndocker compose up -d\n```\n\n### Access\n\n- **Web UI**: `http://localhost` (or your configured port)\n- **API / Swagger**: `http://localhost:8000/docs`\n- **Telegram**: Message your bot to start ingesting (if configured)\n\n### Load demo data anytime\n\n```bash\npython -m src.cli load-demo     # adds 3 sample docs + 5 wiki articles\npython -m src.cli clear-demo    # removes them\n```\n\n## Clients\n\nThe server runs on its own — the clients below are optional add-ons that capture content into your PageFly instance.\n\n### Browser extension (Chrome / Edge / Brave / Arc)\n\nOne-click clip the page you're reading into your knowledge base.\n\nPath: `browser-extension/` (Manifest V3, unpacked load).\n\n```\n1. Open chrome://extensions (or the equivalent in your Chromium browser)\n2. Enable \"Developer mode\" (top-right toggle)\n3. \"Load unpacked\" → pick the browser-extension/ folder of this repo\n4. Click the extension icon → set Server URL = https://api.your-domain\n   and paste your API token from the server's settings page\n5. On any web page, click the extension icon → \"Clip this page\"\n```\n\nFirefox / Safari are not supported yet (V3 nuances differ enough to need their own builds).\n\n### macOS desktop capture\n\nMenu-bar app that captures your active app + window context every few seconds and lets you record meeting audio that gets transcribed server-side.\n\nPath: `desktop-capture/` (Swift / SwiftUI, Xcode 15+).\n\n**Build a personal-use copy**:\n```bash\ncd desktop-capture\n./scripts/package-local.sh\n# → produces dist/PageflyCapture-\u003cversion\u003e.dmg\nopen dist/PageflyCapture-*.dmg\n# Drag PageflyCapture.app into Applications\nxattr -dr com.apple.quarantine /Applications/PageflyCapture.app   # ad-hoc signed; clears Gatekeeper warning\n```\n\nOpen PageflyCapture from Applications → menu bar icon appears → click → **Preferences** → enter `https://api.your-domain` + API token → grant **Accessibility** + **Microphone** in System Settings → Privacy when prompted. The icon turns green once everything is wired.\n\nThe script auto-picks any `Apple Development` or `Developer ID Application` cert in your login keychain (stable identity → TCC keeps your grants across rebuilds). With no cert it falls back to ad-hoc, which works but re-prompts for permissions on every reinstall.\n\nFor shipping signed + notarized builds to other people, see `desktop-capture/scripts/release.sh` (requires a paid Apple Developer ID).\n\n## Tech Stack\n\n### Backend\n| Layer | Choice |\n|-------|--------|\n| Runtime | Python 3.11+ |\n| API | FastAPI |\n| Database | SQLite |\n| AI Agents | Claude Agent SDK (Anthropic) |\n| Scheduler | APScheduler |\n| Bot | python-telegram-bot |\n\n### Frontend\n| Layer | Choice |\n|-------|--------|\n| Framework | React + Vite + TypeScript |\n| Styling | Tailwind CSS v4 + shadcn/ui |\n| Router | react-router-dom v6 |\n| Icons | Lucide React |\n\n### AI Models\n| Task | Model |\n|------|-------|\n| Classification \u0026 Agents | Claude (Anthropic) |\n| Voice Transcription | gpt-4o-transcribe (OpenAI) |\n| Image OCR | mistral-ocr-latest + mistral-small-latest |\n\n## Project Structure\n\n```\npagefly/\n├── src/\n│   ├── agents/          # Compiler, Query, Review agents (Claude SDK)\n│   ├── channels/        # Telegram bot, REST API\n│   ├── governance/      # Classifier, Organizer, Integrity checker\n│   ├── ingest/          # Pipeline + converters (PDF, DOCX, voice, image, URL)\n│   ├── scheduler/       # Cron jobs, inbox watcher\n│   ├── shared/          # Config, indexer, activity log, types\n│   └── storage/         # SQLite DB, deletion logic\n├── config/\n│   ├── SCHEMA.md        # Wiki conventions (injected into agent prompts)\n│   └── skills/          # Agent skill definitions\n├── frontend/            # React + Vite + Tailwind\n├── data/                # Runtime data (not tracked)\n│   ├── raw/             # Ingested documents\n│   ├── knowledge/       # Classified \u0026 organized\n│   └── wiki/            # Compiled articles\n├── docker-compose.yml\n└── Dockerfile\n```\n\n## Links\n\n- **Author**: [@yrzhe_top](https://x.com/yrzhe_top)\n- **The Tweet**: [Reply to Karpathy](https://x.com/yrzhe_top/status/2039944530988847617)\n- **Inspired by**: [Karpathy's LLMWiki](https://x.com/karpathy/status/1039944530988847617)\n- **Live**: [pagefly.ink](https://pagefly.ink)\n\n## License\n\n[MIT](LICENSE) — do whatever you want with it.\n\n---\n\n\u003cdiv align=\"center\"\u003e\n  \u003csub\u003eBuilt by \u003ca href=\"https://x.com/yrzhe_top\"\u003eyrzhe\u003c/a\u003e with Claude, one conversation at a time.\u003c/sub\u003e\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYrzhe%2Fpagefly","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYrzhe%2Fpagefly","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYrzhe%2Fpagefly/lists"}