{"id":48262259,"url":"https://github.com/yapit-tts/yapit","last_synced_at":"2026-04-04T21:37:52.527Z","repository":{"id":337555346,"uuid":"969143707","full_name":"yapit-tts/yapit","owner":"yapit-tts","description":"Listen to anything. TTS for documents, papers, and web pages.","archived":false,"fork":false,"pushed_at":"2026-03-30T22:47:21.000Z","size":7097,"stargazers_count":10,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-31T00:32:24.106Z","etag":null,"topics":["document-ai","document-reader","fastapi","gemini","gemini-api","markdown-converter","markdown-viewer","pdf-document-processor","react","self-hosted","text-to-speech","tts","tts-gui","yolo"],"latest_commit_sha":null,"homepage":"https://yapit.md","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yapit-tts.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-04-19T13:41:58.000Z","updated_at":"2026-03-30T15:21:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/yapit-tts/yapit","commit_stats":null,"previous_names":["yapit-tts/yapit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/yapit-tts/yapit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yapit-tts%2Fyapit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yapit-tts%2Fyapit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yapit-tts%2Fyapit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yapit-tts%2Fyapit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yapit-tts","download_url":"https://codeload.github.com/yapit-tts/yapit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yapit-tts%2Fyapit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31415113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T20:09:54.854Z","status":"ssl_error","status_checked_at":"2026-04-04T20:09:44.350Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document-ai","document-reader","fastapi","gemini","gemini-api","markdown-converter","markdown-viewer","pdf-document-processor","react","self-hosted","text-to-speech","tts","tts-gui","yolo"],"created_at":"2026-04-04T21:37:49.153Z","updated_at":"2026-04-04T21:37:52.510Z","avatar_url":"https://github.com/yapit-tts.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"frontend/public/favicon.svg\" width=\"80\" height=\"80\"\u003e\n\n**yapit**: Listen to anything. Open-source TTS for documents, web pages, and text.\n\n\u003ch3\u003e\n\n[Website](https://yapit.md) | [CLI](https://github.com/yapit-tts/yapit-cli) | [Self-Host](#self-hosting) | [Architecture](docs/architecture.md)\n\n\u003c/h3\u003e\n\n[![GitHub Repo stars](https://img.shields.io/github/stars/yapit-tts/yapit)](https://github.com/yapit-tts/yapit/stargazers)\n[![CI/CD](https://github.com/yapit-tts/yapit/actions/workflows/deploy.yml/badge.svg)](https://github.com/yapit-tts/yapit/actions/workflows/deploy.yml)\n[![License: AGPL-3.0](https://img.shields.io/badge/license-AGPL--3.0-blue)](LICENSE)\n\n\u003c/div\u003e\n\n\u003cimg width=\"3840\" height=\"2880\" alt=\"image\" src=\"https://github.com/user-attachments/assets/706fcf0d-896b-4bae-b826-0d1e49262383\" /\u003e\n\n---\n\nPaste a URL or upload a PDF. Yapit renders the document and reads it aloud.\n\n- Handles the documents other TTS tools can't: academic papers with math, citations, figures, tables, messy formatting. Equations get spoken descriptions, citations become prose, page noise is skipped. The original content displays faithfully.\n- 170+ voices across 15 languages. Premium voices or free local synthesis that runs entirely in your browser, no account needed.\n- Vim-style keyboard shortcuts, document outliner, media key support, adjustable speed, dark mode, share by link.\n- Markdown export: append `/md` to any document URL to get clean markdown via curl. `/md-annotated` includes TTS annotations.\n\nPowered by [Gemini](https://ai.google.dev/gemini-api), [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M), [Inworld TTS](https://inworld.ai), [DocLayout-YOLO](https://huggingface.co/juliozhao/DocLayout-YOLO-DocStructBench), [defuddle](https://github.com/kepano/defuddle).\n\n## Self-hosting\n\n```bash\ngit clone --depth 1 https://github.com/yapit-tts/yapit.git \u0026\u0026 cd yapit\ncp .env.selfhost.example .env.selfhost # edit to enable optional features (AI-extraction, custom TTS models)\nmake self-host\n```\n\nOpen [http://localhost](http://localhost). Data persists across restarts.\nTo stop: `make self-host-down`.\n\n### Multi-user mode\n\nBy default, yapit runs in **single-user mode** — no login required, all features unlocked. `.env.selfhost` is self-documenting — see the comments for optional features (AI extraction, custom TTS models).\n\nIf you want user accounts with login (e.g., for a family or small team), set `AUTH_ENABLED=true` in `.env.selfhost`, uncomment the Stack Auth section below it, and use `make self-host-auth` instead. This adds Stack Auth and ClickHouse containers. Note: in single-user mode, all requests share one user — everyone on the network sees the same document library.\n\n### Custom TTS voices\n\nUse any server implementing the OpenAI `/v1/audio/speech` API ([vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI), [AllTalk](https://github.com/erew123/alltalk_tts), [Chatterbox TTS](https://github.com/devnen/Chatterbox-TTS-Server), etc.).\n\nAdd to `.env.selfhost`:\n\n```env\nOPENAI_TTS_BASE_URL=http://your-tts-server:8091/v1\nOPENAI_TTS_API_KEY=your-key-or-empty\nOPENAI_TTS_MODEL=your-model-name\n```\n\nVoices are auto-discovered if the server supports `GET /v1/audio/voices`. Otherwise set `OPENAI_TTS_VOICES=voice1,voice2,...`.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eExample: OpenAI TTS\u003c/strong\u003e\u003c/summary\u003e\n\nOpenAI doesn't support voice auto-discovery, so `OPENAI_TTS_VOICES` is required.\n\n```env\nOPENAI_TTS_BASE_URL=https://api.openai.com/v1\nOPENAI_TTS_API_KEY=sk-...\nOPENAI_TTS_MODEL=tts-1\nOPENAI_TTS_VOICES=alloy,echo,fable,nova,onyx,shimmer\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eExample: Qwen3-TTS via vLLM-Omni\u003c/strong\u003e\u003c/summary\u003e\n\nRequires GPU. The default stage config assumes \u003e=16GB VRAM. For 8GB cards (e.g., RTX 3070 Ti), create a custom config with lower sequence lengths and memory utilization — see the [stage config reference](https://docs.vllm.ai/projects/vllm-omni/en/stable/configuration/stage_configs/).\n\n```bash\npip install vllm-omni\nvllm-omni serve Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice \\\n    --omni --port 8091 --trust-remote-code --enforce-eager \\\n    --stage-configs-path /path/to/stage_configs.yaml # if you have low VRAM. `max_model_len: 1024` should work on 8GB\n```\n\nThen configure yapit:\n\n```env\nOPENAI_TTS_BASE_URL=http://your-gpu-host:8091/v1\nOPENAI_TTS_API_KEY=EMPTY\nOPENAI_TTS_MODEL=Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice\n```\n\nVoices are auto-discovered from the server (9 built-in speakers for CustomVoice models).\n\n\u003c/details\u003e\n\n### AI document extraction\n\nVision-based PDF/image processing works with any OpenAI-compatible API.\n\nAdd to `.env.selfhost`:\n\n```env\nAI_PROCESSOR=openai\nAI_PROCESSOR_BASE_URL=https://openrouter.ai/api/v1  # or your vLLM/Ollama endpoint\nAI_PROCESSOR_API_KEY=your-key\nAI_PROCESSOR_MODEL=qwen/qwen3-vl-235b-a22b-instruct  # any vision-capable model\n```\n\nOr use Google Gemini directly (with batch-mode support): `AI_PROCESSOR=gemini` + `GOOGLE_API_KEY=your-key`.\n\n### GPU workers for Kokoro TTS \u0026 YOLO figure detection\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003e\u003c/strong\u003e\u003c/summary\u003e\n\nKokoro and YOLO run as pull-based workers — any machine with Redis access can join. Connect from the local network or via Tailscale. GPU and CPU workers run side-by-side; faster workers naturally pull more jobs. Scale by running more containers on any machine that can reach Redis.\n\nPrereq: Docker 25+, [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) with [CDI enabled](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html), network access to the Redis instance.\n\n```bash\n# One-time GPU setup: generate CDI spec + enable CDI in Docker\nsudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml\n# Add {\"features\": {\"cdi\": true}} to /etc/docker/daemon.json, then:\nsudo systemctl restart docker\n\ngit clone --depth 1 https://github.com/yapit-tts/yapit.git \u0026\u0026 cd yapit\n\n# Pull only the images you need\ndocker compose -f docker-compose.worker.yml pull kokoro-gpu yolo-gpu\n\n# Start 2 Kokoro + 1 YOLO worker\nREDIS_URL=redis://\u003chost\u003e:6379/0 docker compose -f docker-compose.worker.yml up -d \\\n  --scale kokoro-gpu=2 --scale yolo-gpu=1 kokoro-gpu yolo-gpu\n```\n\nAdjust `--scale` to your GPU. A 4GB card fits 2 Kokoro + 1 YOLO comfortably.\n\n\u003cdetails\u003e\n\u003csummary\u003eNVIDIA MPS (recommended for multiple workers per GPU)\u003c/summary\u003e\n\n[MPS](https://docs.nvidia.com/deploy/mps/) lets multiple workers share one GPU context — less VRAM overhead, no context switching. Without MPS, each worker gets its own CUDA context (~300MB each). The compose file mounts the MPS pipe automatically; just start the daemon.\n\n```bash\nsudo tee /etc/systemd/system/nvidia-mps.service \u003e /dev/null \u003c\u003c'EOF'\n[Unit]\nDescription=NVIDIA Multi-Process Service (MPS)\nAfter=nvidia-persistenced.service\n\n[Service]\nType=forking\nExecStart=/usr/bin/nvidia-cuda-mps-control -d\nExecStop=/bin/sh -c 'echo quit | /usr/bin/nvidia-cuda-mps-control'\nRestart=on-failure\n\n[Install]\nWantedBy=multi-user.target\nEOF\nsudo systemctl daemon-reload\nsudo systemctl enable --now nvidia-mps\n```\n\n\u003c/details\u003e\n\n\u003c/details\u003e\n\n## Roadmap\n\nNext:\n- Support exporting audio as MP3.\n- Support word-level highlighting for kokoro english\n\nLater:\n- Support thinking parameter for Gemini\n- Support temperature parameter for Inworld\n- Support AI-transform for websites.\n\n## Development\n\n```bash\nuv sync                              # install Python dependencies\nnpm install --prefix frontend        # install frontend dependencies\nmake dev-env 2\u003e/dev/null || touch .env  # decrypt secrets, or create empty .env\nmake dev-cpu                         # start backend services (Docker Compose)\ncd frontend \u0026\u0026 npm run dev           # start frontend\nmake test-local                      # run tests\n```\n\nSee [agent/knowledge/dev-setup.md](agent/knowledge/dev-setup.md) for full setup instructions.\n\nThe `agent/knowledge/` directory is the project's in-depth knowledge base, maintained jointly with Claude during development.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyapit-tts%2Fyapit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyapit-tts%2Fyapit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyapit-tts%2Fyapit/lists"}