{"id":30514314,"url":"https://github.com/riii111/claude-readability-hook","last_synced_at":"2026-04-14T15:31:26.234Z","repository":{"id":308482952,"uuid":"1026259394","full_name":"riii111/claude-readability-hook","owner":"riii111","description":"✂️ HTML ➜ 📜 Text – tuned for AI prompts \u0026 token thrift","archived":false,"fork":false,"pushed_at":"2025-08-11T03:29:17.000Z","size":533,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-26T07:36:43.969Z","etag":null,"topics":["bun","claude-code","fastapi"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/riii111.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-25T15:17:59.000Z","updated_at":"2025-08-26T07:35:40.000Z","dependencies_parsed_at":"2025-08-06T07:27:03.642Z","dependency_job_id":"fa343ce4-8f91-4daa-bc98-14a7f0f2b76e","html_url":"https://github.com/riii111/claude-readability-hook","commit_stats":null,"previous_names":["riii111/claude-readability-hook"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/riii111/claude-readability-hook","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riii111%2Fclaude-readability-hook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riii111%2Fclaude-readability-hook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riii111%2Fclaude-readability-hook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riii111%2Fclaude-readability-hook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/riii111","download_url":"https://codeload.github.com/riii111/claude-readability-hook/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riii111%2Fclaude-readability-hook/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31803168,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T11:13:53.975Z","status":"ssl_error","status_checked_at":"2026-04-14T11:13:53.299Z","response_time":153,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bun","claude-code","fastapi"],"created_at":"2025-08-26T07:33:41.642Z","updated_at":"2026-04-14T15:31:26.229Z","avatar_url":"https://github.com/riii111.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eClaude Readability Hook\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  ✂️ HTML ➜ 📜 Text – tuned for \u003cb\u003eAI prompts\u003c/b\u003e \u0026amp; \u003cb\u003etoken thrift\u003c/b\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/built%20with-TypeScript %26 Python-blue\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/extraction-Trafilatura %E2%86%92 Readability %2B APIs-yellow\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/SSR-Playwright-critical\" /\u003e\n\u003c/p\u003e\n\n---\n\n## 👩‍💻 TL;DR\n\n|  | What it does | Why you care |\n|---|---|---|\n| 🧹 **Trim the fluff** | Strips ads, nav \u0026amp; code fences | ⬇️ 40‑70 % token cut |\n| 🕸️ **Any website** | Handles JS‑heavy SPA via headless Chromium | No \"blank page\" failures |\n| 🧠 **Self‑tuning** | Scores every extraction \u0026amp; auto‑switches engine | Always picks the best text |\n| ⚡ **Forum‑optimized** | Direct API integration for Reddit/StackOverflow | 2‑3× better content capture |\n| 🔐 **Safe by default** | SSRF guard + DNS re‑resolve | Drop‑in for prod |\n\n---\n\n## 🏃‍♂️ Quick Start\n\n```bash\ngit clone https://github.com/you/claude-readability-hook\ncd claude-readability-hook\ndocker compose up -d                      # start gateway + extractor + renderer\ncurl -XPOST :7777/extract -d '{\"url\":\"https://example.com\"}' | jq '.text | length'\n```\n\n---\n\n## 🏗️ Architecture (60‑sec view)\n\n```mermaid\ngraph TD\n  Claude[Claude Hook] --\u003e A\n  subgraph \"Gateway\"\n    A[SSRF Guard] --\u003e B{Needs SSR?}\n    B --\u003e|No| C[Trafilatura]\n    B --\u003e|Yes| R[Playwright] --\u003e C\n    C --\u003e|Low score| D[Readability.js]\n    C --\u003e Result[Result]\n    D --\u003e Result\n  end\n  Result --\u003e Claude\n```\n\n---\n\n## 🚀 Feature Highlights\n\n* **Smart engine switch** – Trafilatura ➜ Readability whenever score \u0026lt; 50  \n* **AMP / print rewrite** – auto‑fetches lightweight HTML variants  \n* **24 h LRU cache** – hit‑ratio metric exposed via Prometheus  \n* **OpenTelemetry hooks** – trace every extract / render call\n\n### 🎯 Special Site Support\n\nOptimized extraction for developer-focused platforms:\n\n* **Stack Overflow** – Official API integration fetches question + top 5 answers (vote-sorted)\n* **Reddit** – JSON endpoint captures post + top 20 comments + replies  \n* **Auto-fallback** – Falls back to standard pipeline if API fails\n\n**Results**: 2-3× better content capture vs. generic HTML parsing\n\n---\n\n## 📋 REST API\n\n| Verb | Path | Description |\n|------|------|-------------|\n| `POST` | `/extract` | Return `{title,text,engine,score,cached}` |\n| `GET`  | `/health`  | Dependency \u0026 self check |\n| `GET`  | `/metrics` | Prometheus exposition |\n\n\u003cdetails\u003e\n\u003csummary\u003eExample request\u003c/summary\u003e\n\n```bash\ncurl -XPOST :7777/extract \\\n     -H 'Content-Type: application/json' \\\n     -d '{\"url\":\"https://news.ycombinator.com/item?id=39237223\"}'\n```\n\n\u003c/details\u003e\n\n---\n\n## 📈 Key Metrics\n\n```promql\n# success rate per engine\nrate(gateway_extract_total{success=\"true\"}[5m]) by (engine)\n\n# SSR usage %\nsum(rate(gateway_extract_total{ssr=\"true\"}[5m]))\n  / sum(rate(gateway_extract_total[5m]))\n\n# cache hit ratio\nsum(rate(gateway_cache_total{op=\"hit\"}[5m]))\n  / sum(rate(gateway_cache_total{op=~\"hit|miss\"}[5m]))\n```\n\n---\n\n## 🛠️ Local Dev\n\n```bash\n# Start all services\ndocker compose up -d\n\n# Development with hot-reload\ncd apps/gateway \u0026\u0026 bun install \u0026\u0026 bun run dev    # Gateway\ncd apps/extractor \u0026\u0026 uv sync \u0026\u0026 uv run python server.py  # Extractor\n```\n\n\u003e Cache \u0026amp; rate‑limit are disabled when `NODE_ENV=test`.\n\n---\n\n## 🗺️ Roadmap\n\n* [ ] Chunk‑level summarization for giant docs  \n* [ ] PDF / EPUB source support  \n* [ ] Optional GPT‑4 \"refine\" post‑processor  \n\n---\n\n## 🙏 Acknowledgements\n\nPowered by **Trafilatura**, **Mozilla Readability**, and **Playwright**.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Friii111%2Fclaude-readability-hook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Friii111%2Fclaude-readability-hook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Friii111%2Fclaude-readability-hook/lists"}