{"id":50425462,"url":"https://github.com/aman179102/podvoice","last_synced_at":"2026-05-31T10:03:44.317Z","repository":{"id":334539505,"uuid":"1141704404","full_name":"aman179102/podvoice","owner":"aman179102","description":"Local-first CLI that turns Markdown scripts into multi-speaker podcast-style audio using Coqui XTTS v2.","archived":false,"fork":false,"pushed_at":"2026-03-29T08:14:56.000Z","size":169,"stargazers_count":25,"open_issues_count":1,"forks_count":10,"subscribers_count":7,"default_branch":"main","last_synced_at":"2026-03-29T10:12:10.736Z","etag":null,"topics":["ai-audio","automation","cli","content-creation","coqui-tts","developer-tools","local-first","local-first-ai","markdown-to-audio","offline-ai","open-source-cli","opensource","podcast","python","text-to-speech","tts","xtts"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aman179102.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-25T09:22:47.000Z","updated_at":"2026-03-29T07:52:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/aman179102/podvoice","commit_stats":null,"previous_names":["aman179102/podvoice"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/aman179102/podvoice","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Fpodvoice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Fpodvoice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Fpodvoice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Fpodvoice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aman179102","download_url":"https://codeload.github.com/aman179102/podvoice/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Fpodvoice/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33726719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-audio","automation","cli","content-creation","coqui-tts","developer-tools","local-first","local-first-ai","markdown-to-audio","offline-ai","open-source-cli","opensource","podcast","python","text-to-speech","tts","xtts"],"created_at":"2026-05-31T10:03:43.764Z","updated_at":"2026-05-31T10:03:44.309Z","avatar_url":"https://github.com/aman179102.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\n\n# 🧠 Podvoice\n\nPodvoice is a local-first AI podcast generator that converts simple Markdown scripts into **multi-speaker audio**.\n\nOriginally built as a CLI tool, Podvoice now includes **PodVoice Studio** — a modern web-based GUI for creating, previewing, and generating AI audio visually.\n\nNo cloud APIs. No subscriptions. Fully offline.\n\nRuns on **Linux, Windows, macOS, and FreeBSD**.\n\n---\n## Why Podvoice?\n\nMost AI audio tools:\n- Require paid APIs\n- Depend on cloud services\n\nPodvoice is:\n- Local-first\n- Fully offline\n- Developer-friendly\n- Now with a visual GUI (PodVoice Studio)\n\n---\n\n## Features\n\n* **Markdown-based scripts**\n* **Multiple logical speakers**\n* **Deterministic voice assignment**\n* **Single stitched output file**\n* **WAV or MP3 export**\n* **Local-only inference**\n* **CPU-first (GPU optional)**\n* **Cross-platform support**\n* **🎙️ Studio Web UI** — Modern single-page interface for voice selection, preview, and generation\n* **🔊 Built-in multi-speaker models** — VCTK vits and others with cached voice demos\n* **⚡ AJAX-based generation** — No page reloads, instant audio playback\n* **🎨 Modern dark theme** — Clean sidebar layout with zero scrolling\n* **📁 Profile management** — YAML-based speaker profiles with reference audio support\n* **🔄 Multi-reference audio** — Concatenate multiple clips for better voice conditioning\n\n---\n\n## Supported platforms\n\n| Platform | Status            | Notes                  |\n| -------- | ----------------- | ---------------------- |\n| Linux    | ✅ Fully supported | Primary dev platform   |\n| macOS    | ✅ Fully supported | Intel + Apple Silicon  |\n| Windows  | ✅ Fully supported | PowerShell             |\n| FreeBSD  | ✅ Supported       | Requires ffmpeg        |\n| WSL2     | ✅ Supported       | Recommended on Windows |\n\n---\n\n## Input format\n\nPodvoice consumes Markdown files with speaker blocks:\n\n```markdown\n[Host | calm]\nWelcome to the show.\n\n[Guest | warm]\nIf this sounds useful, try writing your own script\nand see how easily Markdown becomes audio.\n```\n\nRules:\n\n* Speaker name is **required**\n* Emotion tag is **optional**\n* Text continues until the next speaker block\n* Blank lines are allowed\n\n\n---\n\n\n## ▶️ Demo Video of Podvoice Studio (GUI USAGE)\n\n\u003cdiv align=\"center\"\u003e\n  \n\n\nhttps://github.com/user-attachments/assets/54970066-93d0-45f7-8ca0-e971b38b4c15\n\n\n\n\n\n\n\u003c/div\u003e\n\n---\n\n\n## 🎧 Demo Audio\n\n\u003cdiv align=\"center\"\u003e\n  \n\n\nhttps://github.com/user-attachments/assets/6f468a4f-c4c9-446c-a6b9-b365c3e7f131\n\n\n\n\n\n\n\u003c/div\u003e\n\n\n\n## ▶️ Demo Video of Podvoice (CLI USAGE)\n\n\u003cdiv align=\"center\"\u003e\n  \n\n\nhttps://github.com/user-attachments/assets/c9e9c5f0-ce03-4d71-952f-927cab55bd83\n\n\n\n\u003c/div\u003e\n---\n\n\n\n\n## Quick start (ALL operating systems)\n\n### 1️⃣ System requirements (common)\n\nRequired everywhere:\n\n* **Python 3.10.x**\n* **ffmpeg**\n* **espeak** or **espeak-ng** (required for Studio with built-in multi-speaker models)\n* Internet access **only for first run**\n* ~5–8 GB free disk space (model cache)\n\n---\n\n### 2️⃣ Install system dependencies\n\n#### 🐧 Linux (Ubuntu / Debian)\n\n```bash\nsudo apt update\nsudo apt install -y python3.10 python3.10-venv ffmpeg git espeak\n```\n\n---\n\n#### 🍎 macOS (Homebrew)\n\n```bash\nbrew install python@3.10 ffmpeg git\n```\n\n---\n\n#### 🪟 Windows (PowerShell)\n\n```powershell\nwinget install Python.Python.3.10\nwinget install ffmpeg\nwinget install Git.Git\n```\n\nRestart the terminal after installing Python.\n\n---\n\n#### 🐡 FreeBSD\n\n```sh\npkg install python310 ffmpeg git\n```\n\n---\n\n### 3️⃣ Clone the repository\n\n```bash\ngit clone https://github.com/aman179102/podvoice.git\ncd podvoice\n```\n\n---\n\n## Setup (recommended path)\n\n### 🐧 Linux / 🍎 macOS / 🐡 FreeBSD\n\n```bash\nchmod +x bootstrap.sh\n./bootstrap.sh\n```\n\nThis script will:\n\n* Verify Python 3.10\n* Create a local `.venv`\n* Install fully pinned dependencies from `requirements.lock`\n* Install `podvoice` in editable mode\n\n---\n\n### 🪟 Windows (PowerShell)\n\n#### One-time: allow local scripts\n\n```powershell\nSet-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned\n```\n\n#### Run bootstrap\n\n```powershell\n.\\bootstrap.ps1\n```\n\n---\n\n### Activate the environment\n\n#### Linux / macOS / FreeBSD\n\n```bash\nsource .venv/bin/activate\n```\n\n#### Windows\n\n```powershell\n.venv\\Scripts\\Activate.ps1\n```\n\n---\n\n## Run the demo\n\n```bash\npodvoice examples/demo.md --out demo.wav\n```\n\nOr export MP3:\n\n```bash\npodvoice examples/demo.md --out demo.mp3\n```\n\nOn first run, Coqui XTTS v2 model weights will be downloaded and cached locally.\nSubsequent runs reuse the cache.\n\n---\n\n## 🎙️ Studio Web UI\n\nPodvoice includes a modern, single-page web interface for interactive voice generation.\n\n### Launch Studio\n\n```bash\npodvoice studio --host 127.0.0.1 --port 8000\n```\n\nThen open: `http://127.0.0.1:8000`\n\n### Studio Features\n\n| Feature | Description |\n|---------|-------------|\n| **Sidebar Voice Gallery** | All built-in speakers displayed with human-friendly labels |\n| **Instant Preview** | Click any voice to hear a demo instantly (cached after first play) |\n| **Single TTS** | Type text, select voice, generate audio — no page reloads |\n| **Multi TTS (Podcast)** | Paste Markdown scripts with speaker mapping |\n| **AJAX Generation** | Audio generates and plays without leaving the page |\n| **Modern Dark Theme** | Clean aesthetic with CSS variables, no scrolling |\n\n### Studio Endpoints\n\n- `/` or `/single` — Single TTS page\n- `/multi` — Multi-speaker podcast page\n- `/demo_wav?voice=p240` — Get cached demo audio for a voice\n- `/health` — Health check endpoint\n\n### Using a Different Model\n\nStudio defaults to `tts_models/en/vctk/vits` (built-in multi-speaker). To use XTTS v2 instead:\n\n```bash\npodvoice studio --model-name tts_models/multilingual/multi-dataset/xtts_v2\n```\n\n---\n\n## CLI usage\n\n```bash\npodvoice SCRIPT.md --out OUTPUT\n```\n\nExamples:\n\n```bash\npodvoice examples/demo.md --out output.wav\n```\n\n```bash\npodvoice examples/demo.md --out podcast.mp3 --language en --device cpu\n```\n\n### Options\n\n| Option             | Description               |\n| ------------------ | ------------------------- |\n| `SCRIPT`           | Input Markdown file       |\n| `--out`, `-o`      | Output `.wav` or `.mp3`   |\n| `--language`, `-l` | XTTS language code        |\n| `--device`, `-d`   | `cpu` (default) or `cuda` |\n\n---\n\n## GPU usage (optional)\n\nIf you have a compatible NVIDIA GPU:\n\n```bash\npodvoice examples/demo.md --device cuda\n```\n\nIf CUDA is unavailable, Podvoice safely falls back to CPU.\n\n---\n\n## 📁 Profile Management\n\nPodvoice supports YAML-based speaker profiles for advanced use cases.\n\n### Profile Directory\n\nDefault: `./podvoice_profiles/profiles.yaml`\n\n### Profile Format\n\n```yaml\nprofiles:\n  my_custom_voice:\n    builtin_speaker: p240\n  cloned_voice:\n    reference_audio: ./samples/voice.wav\n  multi_sample_voice:\n    reference_audios:\n      - ./samples/clip1.wav\n      - ./samples/clip2.wav\n      - ./samples/clip3.wav\n```\n\n### Using Profiles\n\nProfiles are automatically loaded and can be referenced in your Markdown scripts by speaker name.\n\n---\n\n## Performance notes\n\nYou may see warnings like:\n\n```\nCould not initialize NNPACK! Reason: Unsupported hardware.\n```\n\n✔️ These are **harmless**\n✔️ Audio generation will still complete\n❌ No action required\n\n---\n\n## How voices are assigned\n\nPodvoice does **not** train voices.\n\nInstead:\n\n* Uses built-in XTTS v2 speakers\n* Hashes speaker names deterministically\n* Maps each logical speaker to a stable voice\n\nImplications:\n\n* Same speaker name → same voice\n* Rename speaker → possibly different voice\n* XTTS update → mapping may change\n\nFallback: default XTTS voice.\n\n---\n\n## Project structure\n\n```text\npodvoice/\n├── podvoice/\n│   ├── cli.py            # CLI entrypoint\n│   ├── parser.py         # Markdown parser\n│   ├── tts.py            # XTTS inference\n│   ├── audio.py          # Audio stitching\n│   ├── studio.py         # FastAPI web UI\n│   ├── profiles.py       # YAML profile management\n│   ├── preprocessing.py  # Audio preprocessing\n│   └── utils.py\n│\n├── examples/\n│   └── demo.md\n│\n├── podvoice_profiles/    # Voice profiles directory\n│\n├── bootstrap.sh\n├── bootstrap.ps1\n├── pyproject.toml\n└── README.md\n```\n\n---\n\n## Responsible use\n\nPodvoice generates natural-sounding speech.\n\nDo **not**:\n\n* Impersonate real people without consent\n* Use generated audio for fraud or deception\n\nAlways disclose synthesized content where appropriate.\n\nYou are responsible for compliance with all applicable laws and licenses,\nincluding those of Coqui XTTS v2.\n\n---\n\n## Contributing\n\nPodvoice is intentionally simple.\n\nGood contributions:\n\n* Bug reports with minimal reproduction scripts\n* CLI UX improvements\n* Documentation clarity\n* Cross-platform fixes\n\nNon-goals:\n\n* Cloud dependencies\n* Training pipelines\n* Over-engineering\n\n**Goal:** local, boring, reliable software.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faman179102%2Fpodvoice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faman179102%2Fpodvoice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faman179102%2Fpodvoice/lists"}