{"id":48547327,"url":"https://github.com/po4yka/bite-size-reader","last_synced_at":"2026-04-08T07:02:59.611Z","repository":{"id":317167349,"uuid":"1056153022","full_name":"po4yka/bite-size-reader","owner":"po4yka","description":"Telegram bot for bite-sized content summaries — scrapes articles/YouTube/channels, summarizes via LLM, serves a Carbon web UI and mobile API. Self-hostable.","archived":false,"fork":false,"pushed_at":"2026-04-01T13:22:10.000Z","size":12812,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-01T15:22:57.562Z","etag":null,"topics":["article-summarizer","channel-digest","content-extraction","mcp-server","openrouter","scraper-chain","telegram-bot","vector-search","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/po4yka.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-09-13T13:56:11.000Z","updated_at":"2026-04-01T13:22:14.000Z","dependencies_parsed_at":"2025-09-29T10:23:32.587Z","dependency_job_id":"74ac0f8a-e5a5-41bd-96bc-7112700b1ab5","html_url":"https://github.com/po4yka/bite-size-reader","commit_stats":null,"previous_names":["po4yka/bite-size-reader"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/po4yka/bite-size-reader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/po4yka%2Fbite-size-reader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/po4yka%2Fbite-size-reader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/po4yka%2Fbite-size-reader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/po4yka%2Fbite-size-reader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/po4yka","download_url":"https://codeload.github.com/po4yka/bite-size-reader/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/po4yka%2Fbite-size-reader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31544089,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"online","status_checked_at":"2026-04-08T02:00:06.127Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["article-summarizer","channel-digest","content-extraction","mcp-server","openrouter","scraper-chain","telegram-bot","vector-search","web-scraping"],"created_at":"2026-04-08T07:02:58.877Z","updated_at":"2026-04-08T07:02:59.591Z","avatar_url":"https://github.com/po4yka.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bite-Size Reader\n\nAsync Telegram bot that summarizes web articles and YouTube videos into structured JSON. For articles, it uses a multi-provider scraper chain (Scrapling / self-hosted Firecrawl / Playwright / Crawlee / direct HTML) + OpenRouter; for YouTube videos, it downloads the video (1080p) and extracts transcripts. Also supports summarizing forwarded channel posts. Returns a strict JSON summary and stores artifacts in SQLite.\n\n**🚀 New to Bite-Size Reader?** Start with the [5-Minute Quickstart Tutorial](docs/tutorials/quickstart.md)\n\n**❓ Have Questions?** Check the [FAQ](docs/FAQ.md) or [Troubleshooting Guide](docs/TROUBLESHOOTING.md)\n\n**📚 All Documentation** → [Documentation Hub](docs/README.md)\n\n---\n\n## Table of Contents\n\n- [Architecture Overview](#architecture-overview)\n- [Quick Start](#quick-start)\n- [Common Use Cases](#common-use-cases)\n- [Commands and Usage](#commands-and-usage)\n- [Environment Configuration](#environment)\n- [Performance Tips](#performance-tips)\n- [Repository Layout](#repository-layout)\n- [YouTube Video Support](#youtube-video-support)\n- [Web Search Enrichment](#web-search-enrichment-optional)\n- [Mobile API](#mobile-api)\n- [Carbon Web Interface](#carbon-web-interface-v1)\n- [MCP Server](#mcp-server)\n- [Redis Caching](#redis-caching)\n- [Karakeep Integration](#karakeep-integration)\n- [Local CLI Summary Runner](#local-cli-summary-runner)\n- [Development](#dev-tooling)\n- [Documentation](#documentation)\n\n---\n\n## Architecture overview\n\n```mermaid\nflowchart LR\n  subgraph TelegramBot\n    TGClient[TelegramClient] --\u003e MsgHandler[MessageHandler]\n    MsgHandler --\u003e AccessController\n    MsgHandler --\u003e CallbackHandler\n    CallbackHandler --\u003e CallbackRegistry[CallbackActionRegistry]\n    CallbackRegistry --\u003e CallbackActions[CallbackActionService]\n    AccessController --\u003e MessageRouter\n    MessageRouter --\u003e CommandProcessor\n    MessageRouter --\u003e URLHandler\n    URLHandler --\u003e URLBatchPolicy[URLBatchPolicyService]\n    URLHandler --\u003e URLAwaitingState[URLAwaitingStateStore]\n    MessageRouter --\u003e ForwardProcessor\n    MessageRouter --\u003e MessagePersistence\n    LifecycleMgr[TelegramLifecycleManager] -.-\u003e TGClient\n    LifecycleMgr -.-\u003e URLHandler\n  end\n\n  subgraph URLPipeline[URL processing pipeline]\n    URLHandler --\u003e URLProcessor\n    URLProcessor --\u003e ContentExtractor\n    ContentExtractor --\u003e ScraperChain[ScraperChain]\n    ScraperChain --\u003e|primary| Scrapling[Scrapling]\n    ScraperChain --\u003e|secondary| Firecrawl[(Firecrawl /scrape)]\n    ScraperChain --\u003e|tertiary| Playwright[Playwright]\n    ScraperChain --\u003e|quaternary| Crawlee[Crawlee]\n    ScraperChain --\u003e|last_resort| DirectHTML[Direct HTML]\n    URLProcessor --\u003e ContentChunker\n    URLProcessor --\u003e LLMSummarizer\n    LLMSummarizer --\u003e OpenRouter[(OpenRouter Chat Completions)]\n  end\n\n  subgraph DigestPipeline[Channel Digest]\n    Scheduler[APScheduler] --\u003e DigestService\n    CommandProcessor -.-\u003e|/digest| DigestService\n    DigestService --\u003e ChannelReader\n    ChannelReader --\u003e UserbotClient[Userbot Client]\n    DigestService --\u003e DigestAnalyzer\n    DigestAnalyzer --\u003e OpenRouter\n    DigestService --\u003e DigestFormatter\n    DigestFormatter --\u003e TGClient\n    CommandProcessor -.-\u003e|/init_session| SessionInit[Session Init + Mini App]\n    SessionInit --\u003e UserbotClient\n  end\n\n  subgraph OptionalServices[Optional services]\n    Redis[(Redis)] -.-\u003e ContentExtractor\n    Redis -.-\u003e LLMSummarizer\n    Redis -.-\u003e MobileAPI\n    ChromaDB[(ChromaDB)] -.-\u003e SearchService\n    MCPServer[MCP Server] -.-\u003e SQLite\n    MCPServer -.-\u003e SearchService\n  end\n\n  ForwardProcessor --\u003e LLMSummarizer\n  LLMSummarizer -.-\u003e| optional | WebSearch[WebSearchAgent]\n  WebSearch -.-\u003e Firecrawl\n  ContentExtractor --\u003e SQLite[(SQLite)]\n  MessagePersistence --\u003e SQLite\n  LLMSummarizer --\u003e SQLite\n  DigestService --\u003e SQLite\n  MessageRouter --\u003e ResponseFormatter\n  ResponseFormatter --\u003e TGClient\n  TGClient --\u003e| Replies | Telegram\n  Telegram --\u003e| Updates | TGClient\n  UserbotClient --\u003e| Read channels | Telegram\n  ResponseFormatter --\u003e Logs[(Structured + audit logs)]\n\n  subgraph MobileAPI[Mobile API]\n    FastAPI[FastAPI + JWT] --\u003e SQLite\n    FastAPI --\u003e SearchService[SearchService]\n    FastAPI --\u003e DigestFacade\n    DigestFacade --\u003e DigestAPIService\n    DigestAPIService --\u003e SQLite\n    FastAPI --\u003e SystemMaint[SystemMaintenanceService]\n    SystemMaint --\u003e SQLite\n    SystemMaint -.-\u003e Redis\n  end\n```\n\nThe bot ingests updates via a lightweight `TelegramClient`, normalizes them through `MessageHandler`, and hands them to `MessageRouter`/`CallbackHandler` flows. `CallbackHandler` delegates action execution through `CallbackActionRegistry` + `CallbackActionService`, and `URLHandler` delegates URL policy/state concerns through `URLBatchPolicyService` + `URLAwaitingStateStore` before invoking `URLProcessor`. `TelegramLifecycleManager` owns startup/shutdown orchestration of background tasks and warmups. The channel digest subsystem uses a separate `UserbotClient` (authenticated as a real Telegram user) to read channel histories, analyzes posts via LLM, and delivers formatted digests on a schedule or via `/digest`.\n\nFor the mobile API, routers are transport-focused and delegate infrastructure orchestration to dedicated services (`DigestFacade`, `SystemMaintenanceService`) rather than performing DB/Redis/file operations inline. `ResponseFormatter` centralizes Telegram replies and audit logging while all artifacts land in SQLite.\n\n## Quick start\n\n**🚀 5-Minute Setup**: Follow the [Quickstart Tutorial](docs/tutorials/quickstart.md) for step-by-step Docker setup.\n\n**Manual Setup**:\n\n- Copy `.env.example` to `.env` and fill required secrets\n- Build and run with Docker\n- See [DEPLOYMENT.md](docs/DEPLOYMENT.md) for full setup, deployment, and update instructions\n\n---\n\n## Common Use Cases\n\n**I want to...**\n\n| Goal | How | Documentation |\n| ------ | ----- | --------------- |\n| **Summarize web articles** | Send URL to Telegram bot | [Quickstart Tutorial](docs/tutorials/quickstart.md) |\n| **Summarize YouTube videos** | Send YouTube URL (transcript extracted) | [Configure YouTube](docs/how-to/configure-youtube-download.md) |\n| **Search past summaries** | `/search \u003cquery\u003e` command | [FAQ § Search](docs/FAQ.md#can-i-search-my-summaries) |\n| **Get real-time context** | Enable web search enrichment | [Enable Web Search](docs/how-to/enable-web-search.md) |\n| **Speed up responses** | Enable Redis caching | [Setup Redis](docs/how-to/setup-redis-caching.md) |\n| **Build mobile app** | Use Mobile API (JWT auth) | [MOBILE_API_SPEC.md](docs/MOBILE_API_SPEC.md) |\n| **Use web interface** | Open Carbon web UI on `/web` | [Frontend Web Guide](docs/reference/frontend-web.md) |\n| **Integrate with AI agents** | Use MCP server | [MCP Server Guide](docs/mcp_server.md) |\n| **Reduce API costs** | Use free models, caching | [FAQ § Cost Optimization](docs/FAQ.md#cost-optimization) |\n| **Self-host privately** | Docker deployment | [DEPLOYMENT.md](docs/DEPLOYMENT.md) |\n\n---\n\n## Docker\n\n- If you updated dependencies in `pyproject.toml`, generate lock files first: `make lock-uv`.\n- Build: `docker build -f ops/docker/Dockerfile -t bite-size-reader .`\n- Run: `docker run --env-file .env -v $(pwd)/data:/data --name bsr bite-size-reader`\n\n## Commands and usage\n\nYou can simply send a URL (or several URLs) or forward a channel post -- commands are optional.\n\n### Summarization\n\n| Command | Description |\n| --------- | ------------- |\n| `/help`, `/start` | Show help and usage |\n| `/summarize \u003cURL\u003e` | Summarize a URL immediately |\n| `/summarize` | Bot asks for a URL in the next message |\n| `/summarize_all \u003cURLs\u003e` | Summarize multiple URLs without confirmation |\n| `/cancel` | Cancel pending summarize prompt or multi-link confirmation |\n\nMultiple URLs in one message: bot asks \"Process N links?\"; reply \"yes/no\". Each link gets its own correlation ID and is processed sequentially.\n\n### Content Management\n\n| Command | Description |\n| --------- | ------------- |\n| `/unread [limit] [topic]` | Show unread articles, optionally filtered by topic |\n| `/read \u003crequest_id\u003e` | Mark an article as read |\n\n### Search\n\n| Command | Description |\n| --------- | ------------- |\n| `/search \u003cquery\u003e` | Search summaries by keyword |\n| `/find`, `/findweb`, `/findonline` | Search using Firecrawl web search |\n| `/finddb`, `/findlocal` | Search local database only |\n\n### Admin\n\n| Command | Description |\n| --------- | ------------- |\n| `/dbinfo` | Show database statistics |\n| `/dbverify` | Verify database integrity |\n\n### Channel Digest\n\n| Command | Description |\n| --------- | ------------- |\n| `/init_session` | Initialize userbot session via Mini App OTP/2FA flow |\n| `/digest` | Generate a digest of subscribed channels now |\n| `/channels` | List currently subscribed channels |\n| `/subscribe @channel` | Subscribe to a Telegram channel for digests |\n| `/unsubscribe @channel` | Unsubscribe from a channel |\n\n### Integrations\n\n| Command | Description |\n| --------- | ------------- |\n| `/sync_karakeep` | Trigger Karakeep bookmark sync |\n\n## Environment\n\n### ✅ Required (Essential for Basic Functionality)\n\n```bash\nAPI_ID=...                          # Telegram API ID (from https://my.telegram.org/apps)\nAPI_HASH=...                        # Telegram API hash\nBOT_TOKEN=...                       # Telegram bot token (from @BotFather)\nALLOWED_USER_IDS=123456789          # Comma-separated Telegram user IDs (your ID)\nFIRECRAWL_API_KEY=...               # Firecrawl API key (optional -- only for cloud Firecrawl or web search)\nOPENROUTER_API_KEY=...              # OpenRouter API key (or use OPENAI_API_KEY/ANTHROPIC_API_KEY)\nOPENROUTER_MODEL=deepseek/deepseek-v3.2  # Primary LLM model\n```\n\n### 🔧 Optional (Enable Features as Needed)\n\n| Subsystem | Key Variables | When to Enable |\n| ----------- | -------------- | --------------- |\n| **YouTube** | `YOUTUBE_DOWNLOAD_ENABLED=true`\u003cbr\u003e`YOUTUBE_PREFERRED_QUALITY=1080p`\u003cbr\u003e`YOUTUBE_STORAGE_PATH=/data/videos` | Summarize YouTube videos |\n| **Web Search** | `WEB_SEARCH_ENABLED=false`\u003cbr\u003e`WEB_SEARCH_MAX_QUERIES=3` | Add real-time context to summaries |\n| **Redis** | `REDIS_ENABLED=true`\u003cbr\u003e`REDIS_URL` or `REDIS_HOST`/`REDIS_PORT` | Cache responses, speed up bot |\n| **Draft Streaming** | `SUMMARY_STREAMING_ENABLED=true`\u003cbr\u003e`SUMMARY_STREAMING_MODE=section`\u003cbr\u003e`TELEGRAM_DRAFT_STREAMING_ENABLED=true` | Live section previews during OpenRouter summaries |\n| **Scraper Chain** | `SCRAPER_ENABLED=true`\u003cbr\u003e`SCRAPER_PROFILE=balanced`\u003cbr\u003e`SCRAPER_BROWSER_ENABLED=true`\u003cbr\u003e`SCRAPER_PROVIDER_ORDER=[...]` | Control article extraction fallback behavior and tuning |\n| **ChromaDB** | `CHROMA_HOST=http://localhost:8000`\u003cbr\u003e`CHROMA_AUTH_TOKEN` | Semantic search |\n| **Embeddings** | `EMBEDDING_PROVIDER=local`\u003cbr\u003e`GEMINI_API_KEY`\u003cbr\u003e`GEMINI_EMBEDDING_DIMENSIONS=768` | Switch embedding provider (local/Gemini) |\n| **MCP Server** | `MCP_ENABLED=false`\u003cbr\u003e`MCP_TRANSPORT=stdio`\u003cbr\u003e`MCP_PORT=8200` | AI agent integration (Claude Desktop / optional Docker `mcp` profile) |\n| **Mobile API** | `JWT_SECRET_KEY`\u003cbr\u003e`ALLOWED_CLIENT_IDS`\u003cbr\u003e`API_RATE_LIMIT_*` | Build mobile clients |\n| **Karakeep** | `KARAKEEP_ENABLED=false`\u003cbr\u003e`KARAKEEP_API_URL`\u003cbr\u003e`KARAKEEP_API_KEY` | Bookmark sync |\n| **Channel Digest** | `DIGEST_ENABLED=true`\u003cbr\u003e`API_BASE_URL=http://localhost:8000` | Scheduled channel digests |\n\n### ⚙️ Advanced (Fine-Tuning)\n\n| Category | Key Variables | Purpose |\n| ---------- | -------------- | --------- |\n| **Runtime** | `DB_PATH=/data/app.db`\u003cbr\u003e`LOG_LEVEL=INFO`\u003cbr\u003e`DEBUG_PAYLOADS=0`\u003cbr\u003e`MAX_CONCURRENT_CALLS=4` | Performance tuning |\n| **LLM Providers** | `LLM_PROVIDER=openrouter`\u003cbr\u003e`OPENAI_API_KEY`\u003cbr\u003e`ANTHROPIC_API_KEY` | Switch LLM providers |\n| **Fallbacks** | `OPENROUTER_FALLBACK_MODELS=...`\u003cbr\u003e`OPENAI_FALLBACK_MODELS=...` | Model fallback chains |\n\n**📖 Full Reference**: [environment_variables.md](docs/environment_variables.md) (250+ variables documented)\n\n**❓ Configuration Help**: [FAQ § Configuration](docs/FAQ.md#configuration) | [TROUBLESHOOTING § Configuration](docs/TROUBLESHOOTING.md#configuration-issues)\n\n**⚠️ Breaking Rename**: scraper legacy variables `SCRAPLING_*` and `SCRAPER_DIRECT_HTTP_ENABLED` are no longer accepted; startup fails fast with replacement hints.\n\n---\n\n## Performance Tips\n\n**Speed up summarization**:\n\n- ⚡ **Use faster models**: `qwen/qwen3-max` (faster than DeepSeek), `google/gemini-2.0-flash-001:free` (free)\n- 🔄 **Enable Redis caching**: Cache repeated URLs, reduce API calls\n- 📦 **Increase concurrency**: `MAX_CONCURRENT_CALLS=5` (default: 4)\n- 🎯 **Disable optional features**: Set `WEB_SEARCH_ENABLED=false`, `SUMMARY_TWO_PASS_ENABLED=false`\n\n**Reduce costs**:\n\n- 💰 **Use free models**: `google/gemini-2.0-flash-001:free`, `deepseek/deepseek-r1:free` (via OpenRouter)\n- 🔄 **Enable caching**: Avoid re-processing same URLs\n- 🎛 **Adjust token limits**: `MAX_CONTENT_LENGTH_TOKENS=30000` (default: 50000)\n- 📊 **Monitor usage**: Track costs at [OpenRouter Dashboard](https://openrouter.ai/account)\n\n**Optimize storage**:\n\n- 🧹 **Auto-cleanup YouTube**: `YOUTUBE_CLEANUP_AFTER_DAYS=7` (delete old videos)\n- 📏 **Set storage limits**: `YOUTUBE_MAX_STORAGE_GB=10`\n- 💾 **Database maintenance**: Periodic `VACUUM` and index rebuilding\n\n**See detailed optimization guide**: [How to Optimize Performance](docs/how-to/optimize-performance.md) | [FAQ § Performance](docs/FAQ.md#performance)\n\n---\n\n## Repository layout\n\n```\napp/\n  adapters/\n    content/     -- Multi-provider scraper chain, content chunking, LLM summarization, web search context\n      scraper/   -- Protocol, chain, factory, providers (Scrapling, Firecrawl, Playwright, Crawlee, direct HTML)\n    youtube/     -- YouTube video download and transcript extraction\n    external/    -- Response formatting helpers shared by adapters\n    karakeep/    -- Karakeep bookmark sync\n    llm/         -- Provider-agnostic LLM abstraction\n    openrouter/  -- OpenRouter client, payload shaping, error handling\n    telegram/    -- Telegram client, message routing, access control, persistence, command_handlers/\n  agents/        -- Multi-agent system (extraction, summarization, validation, web search)\n  api/           -- Mobile API (FastAPI, JWT auth, sync endpoints)\n    models/      -- Pydantic request/response models\n    routers/     -- Route handlers (auth, summaries, sync, collections, health, system)\n    services/    -- API business logic\n  application/   -- Application layer (DTOs, use cases)\n  config/        -- Configuration modules\n  core/          -- URL normalization, JSON contract, logging, language helpers\n  db/            -- SQLite schema, migrations, audit logging helpers\n  di/            -- Dependency injection\n  domain/        -- Domain models and services (DDD patterns)\n  infrastructure/ -- Persistence layer, event bus, vector store\n    cache/       -- Cache layer (Redis)\n    messaging/   -- Messaging infrastructure\n  mcp/           -- MCP server for AI agent access\n  models/        -- Pydantic-style models (Telegram entities, LLM config)\n  observability/ -- Metrics, tracing, telemetry\n  prompts/       -- LLM prompt templates (en/ru, including web search analysis)\n  security/      -- Security utilities\n  types/         -- Type definitions\n  utils/         -- Validation and helper utilities\nclients/\n  cli/           -- Standalone CLI client package\n  browser-extension/ -- Chrome/Firefox browser extension\n  web/           -- Carbon web interface (React + TypeScript + Vite)\nintegrations/\n  openclaw-skill/ -- OpenClaw MCP skill bundle\nops/\n  config/        -- Versioned example config assets\n  docker/        -- Dockerfiles and compose definitions\n  monitoring/    -- Prometheus/Grafana/Loki/Promtail assets\ntools/\n  scripts/       -- Development and maintenance scripts\ntests/           -- Pytest suites and helper utilities\ndocs/            -- Specs, tutorials, guides, ADRs, and reports\nbot.py           -- Entrypoint wiring config, DB, and Telegram bot\ndocs/SPEC.md     -- Full technical specification\n```\n\n## YouTube video support\n\nThe bot automatically detects YouTube URLs and processes them differently from regular web articles.\n\n**Supported URL formats:** Standard watch, short (`youtu.be`), shorts, live, embed, mobile (`m.youtube.com`), YouTube Music, legacy `/v/`.\n\n**Processing workflow:**\n\n1. Extract video ID from URL (handles query parameters in any order)\n2. Extract transcript via `youtube-transcript-api` (prefers manual, falls back to auto-generated)\n3. Download video in configured quality (default 1080p) via `yt-dlp`\n4. Download subtitles, metadata (JSON), and thumbnail\n5. Generate summary from transcript using LLM\n6. Store video metadata, file paths, and transcript in database\n\n**Storage management:** Videos stored in `/data/videos`, auto-cleanup of old videos, size limits per-video and total, deduplication via URL hash.\n\n**Requirements:** `ffmpeg` (included in Docker image), `yt-dlp`, `youtube-transcript-api`.\n\n## Web search enrichment (optional)\n\nWhen `WEB_SEARCH_ENABLED=true`, the bot enriches article summaries with current web context:\n\n1. LLM analyzes content to identify knowledge gaps (unfamiliar entities, recent events, claims needing verification)\n2. If search would help, LLM extracts targeted search queries (max 3)\n3. Firecrawl Search API retrieves relevant web results\n4. Search context is injected into the summarization prompt\n5. Final summary benefits from up-to-date information beyond LLM training cutoff\n\nOnly ~30-40% of articles trigger search (self-contained content is skipped). Adds 1 extra LLM call for analysis plus 1-3 Firecrawl search calls when triggered. Feature is opt-in to control costs.\n\n## Mobile API\n\nFastAPI-based REST API for mobile clients with Telegram-based JWT authentication, summary retrieval, and sync endpoints. See `docs/MOBILE_API_SPEC.md` for details.\n\n## Carbon Web Interface (V1)\n\nStandalone React + IBM Carbon web UI is available in `clients/web/` and served by FastAPI on:\n\n- `/web`\n- `/web/*` (SPA routes)\n\nStatic assets are published under `/static/web/*`.\n\nCore routes:\n\n- `/web/library`\n- `/web/library/:id`\n- `/web/articles`\n- `/web/search`\n- `/web/submit`\n- `/web/collections`\n- `/web/collections/:id`\n- `/web/digest`\n- `/web/preferences`\n\n### Local development\n\n```bash\ncd clients/web\nnpm install\nnpm run dev\nnpm run check:static\n```\n\nOptional web env vars:\n\n- `VITE_API_BASE_URL` (default: same-origin API)\n- `VITE_TELEGRAM_BOT_USERNAME` (required for Telegram Login Widget in JWT mode)\n- `VITE_ROUTER_BASENAME` (default: `/web`)\n\nFrontend architecture and auth details: [Frontend Web Guide](docs/reference/frontend-web.md).\n\n## MCP Server\n\nModel Context Protocol server that exposes articles and search to external AI agents (OpenClaw, Claude Desktop). Provides 17 tools and 13 resources for searching, retrieving, and exploring stored summaries. Runs as a dedicated Docker container with SSE transport or standalone via stdio. See `docs/mcp_server.md`.\n\n## Redis caching\n\nOptional caching layer for Firecrawl and LLM responses, API rate limiting, sync locks, and background task distributed locking. Degrades gracefully when unavailable. Set `REDIS_ENABLED=true`.\n\n## Karakeep integration\n\nSyncs bookmarks from Karakeep (self-hosted bookmark manager) into the summarization pipeline. Use `/sync_karakeep` to trigger manually or enable `KARAKEEP_AUTO_SYNC_ENABLED=true` for periodic sync.\n\n## Local CLI summary runner\n\n- With the same environment variables exported (Firecrawl + OpenRouter keys, DB path, etc.), run `python -m app.cli.summary --url https://example.com/article`.\n- Pass full message text instead of `--url` to mimic Telegram input, e.g. `python -m app.cli.summary \"/summary https://example.com\"`.\n- The CLI loads environment variables from `.env` in your current directory (or project root) automatically; override with `--env-file path/to/.env` if needed.\n- Add `--accept-multiple` to auto-confirm when multiple URLs are supplied, `--json-path summary.json` to write the final JSON to disk, and `--log-level DEBUG` for verbose traces.\n- The CLI generates stub Telegram credentials automatically, so no real bot token is required for local runs.\n\n## Errors and correlation IDs\n\nAll user-visible errors include `Error ID: \u003ccid\u003e` to correlate with logs and DB `requests.correlation_id`.\n\n## Dev tooling\n\n- Install dev deps: `pip install -r requirements.txt -r requirements-dev.txt`\n- Format: `make format` (ruff format + isort)\n- Lint: `make lint` (ruff)\n- Type-check: `make type` (mypy)\n- Web static checks: `cd clients/web \u0026\u0026 npm run check:static`\n- Web unit tests: `cd clients/web \u0026\u0026 npm run test`\n- Pre-commit: `pre-commit install` then commits will auto-run hooks\n- Optional: `pip install loguru` to enable Loguru-based JSON logging with stdlib bridging\n\n## Pre-commit hooks\n\nHooks run in this order to minimize churn: Ruff (check with `--fix`, format), isort (profile=black), mypy, plus standard hooks. If a first run modifies files, stage the changes and run again.\n\n## Local environment\n\n- Create venv: `make venv` (or run `tools/scripts/create_venv.sh`)\n- Activate: `source .venv/bin/activate`\n- Install deps: `pip install -r requirements.txt -r requirements-dev.txt`\n\n## Dependency management\n\n- Source of truth: `pyproject.toml` ([project] deps + [project.optional-dependencies].dev).\n- Locked requirements are generated to `requirements.txt` and `requirements-dev.txt`.\n- With uv (recommended):\n  - Install: `curl -Ls https://astral.sh/uv/install.sh | sh`\n  - Lock: `make lock-uv`\n- Regenerate locks after changing dependencies in `pyproject.toml`.\n\n## CI\n\nGitHub Actions workflow `.github/workflows/ci.yml` enforces:\n\n- Lockfile freshness (rebuilds from `pyproject.toml` and checks diff)\n- Lint (ruff), format check (ruff format, isort), type check (mypy)\n- Unit tests with coverage (pytest, 80% threshold)\n- Frontend jobs: `frontend-build`, `web-build`, `web-test`, `web-static-check`\n- Docker image build on every push/PR; optional push to GHCR when `PUBLISH_DOCKER` repository variable is set to `true` (non-PR events)\n- OpenAPI spec validation, code complexity (radon)\n- Codecov coverage reporting\n- Integration tests\n- Security checks: Bandit (SAST), pip-audit + Safety (dependency vulns)\n- Secrets scanning: Gitleaks on workspace and full history (history only on push)\n- PR summary automation\n\n## Docker publishing (optional)\n\n- Enable publishing to GitHub Container Registry (GHCR):\n  - In repository settings -\u003e Variables, add `PUBLISH_DOCKER=true`.\n  - Ensure workflow permissions include `packages: write` (already configured).\n  - Images are tagged as:\n    - `ghcr.io/\u003cowner\u003e/\u003crepo\u003e:latest` (on main)\n    - `ghcr.io/\u003cowner\u003e/\u003crepo\u003e:\u003cgit-sha\u003e`\n\n## Automated lockfile PRs\n\n- Workflow `.github/workflows/update-locks.yml` watches `pyproject.toml` and opens a PR to refresh `requirements*.txt` using uv.\n- Auto-merge is enabled for that PR; once CI passes, GitHub will automatically merge it.\n- You can also trigger it manually from the Actions tab.\n\n## Documentation\n\n**📚 Documentation Hub**: [docs/README.md](docs/README.md) - All docs organized by audience and task\n\n### Essential Guides\n\n| Document | Description | Audience |\n| -------- | ----------- | -------- |\n| [Quickstart Tutorial](docs/tutorials/quickstart.md) | Get first summary in 5 minutes | **Users** |\n| [FAQ](docs/FAQ.md) | Frequently asked questions | **All** |\n| [TROUBLESHOOTING.md](docs/TROUBLESHOOTING.md) | Debugging guide with correlation IDs | **All** |\n| [DEPLOYMENT.md](docs/DEPLOYMENT.md) | Setup and deployment guide | **Operators** |\n| [environment_variables.md](docs/environment_variables.md) | Complete config reference (250+ vars) | **All** |\n\n### Technical Documentation\n\n| Document | Description | Audience |\n| -------- | ----------- | -------- |\n| [docs/SPEC.md](docs/SPEC.md) | Full technical specification (canonical) | **Developers** |\n| [CLAUDE.md](CLAUDE.md) | AI assistant codebase guide | **AI Assistants, Developers** |\n| [HEXAGONAL_ARCHITECTURE_QUICKSTART.md](docs/HEXAGONAL_ARCHITECTURE_QUICKSTART.md) | Architecture patterns | **Developers** |\n| [multi_agent_architecture.md](docs/multi_agent_architecture.md) | Multi-agent LLM pipeline | **Developers** |\n| [ADRs](docs/adr/README.md) | Architecture decision records | **Developers** |\n\n### Integration Guides\n\n| Document | Description | Audience |\n| -------- | ----------- | -------- |\n| [MOBILE_API_SPEC.md](docs/MOBILE_API_SPEC.md) | REST API specification | **Integrators** |\n| [Frontend Web Guide](docs/reference/frontend-web.md) | Carbon web architecture and workflows | **Frontend Developers, Integrators** |\n| [mcp_server.md](docs/mcp_server.md) | MCP server (AI agents) | **Integrators** |\n| [claude_code_hooks.md](docs/claude_code_hooks.md) | Development safety hooks | **Developers** |\n\n### Version History\n\n| Document | Description |\n| -------- | ----------- |\n| [CHANGELOG.md](CHANGELOG.md) | Version history and release notes |\n\n## Notes\n\n- Dependencies include Pyrogram; if using PyroTGFork, align installation accordingly.\n- Bot commands are registered on startup for private chats.\n- Python 3.13+ required for all dependencies including scikit-learn for text processing and optional uvloop for async performance.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpo4yka%2Fbite-size-reader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpo4yka%2Fbite-size-reader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpo4yka%2Fbite-size-reader/lists"}