{"id":24983934,"url":"https://github.com/alexykn/torchts","last_synced_at":"2025-04-19T16:40:25.765Z","repository":{"id":275335792,"uuid":"925783612","full_name":"alexykn/TorchTS","owner":"alexykn","description":"A modern text to speech frontend for Kokoro-82M","archived":false,"fork":false,"pushed_at":"2025-02-09T12:54:12.000Z","size":4121,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T06:45:46.331Z","etag":null,"topics":["docker","docker-compose","fastapi","full-stack","javascript","kokoro","kokoro-tts","local-ai","python","pytorch","rest-api","speech-synthesis","speech-synthesis-api","text-to-speech","tts","tts-api","vue","vuetify","web-application","webapp"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alexykn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-01T18:30:49.000Z","updated_at":"2025-02-23T21:39:21.000Z","dependencies_parsed_at":"2025-02-01T19:30:28.529Z","dependency_job_id":"43a97c88-d4e3-42e3-b9d8-17569142e98a","html_url":"https://github.com/alexykn/TorchTS","commit_stats":null,"previous_names":["alexykn/torchts"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexykn%2FTorchTS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexykn%2FTorchTS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexykn%2FTorchTS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexykn%2FTorchTS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alexykn","download_url":"https://codeload.github.com/alexykn/TorchTS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249740034,"owners_count":21318674,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","fastapi","full-stack","javascript","kokoro","kokoro-tts","local-ai","python","pytorch","rest-api","speech-synthesis","speech-synthesis-api","text-to-speech","tts","tts-api","vue","vuetify","web-application","webapp"],"created_at":"2025-02-04T09:40:41.633Z","updated_at":"2025-04-19T16:40:25.744Z","avatar_url":"https://github.com/alexykn.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TorchTS Project\n\n## Overview\n\nTorchTS is a text-to-speech application built with Python and Vue.js. It provides an interface for converting text from various document formats into speech using the Kokoro TTS model. The project combines a FastAPI backend with a Vue.js frontend to create a practical tool for text-to-speech conversion.\n\n![Dark Mode](img/dark_mode.png)\n\n![Light Mode](img/light_mode.png)\n\n## Features\n\n- **Text Processing:** Text handling and chunking utilities\n- **Document Support:** Parse and extract text from PDF, DOCX, ODT, and markdown files\n- **Audio Generation:** Text-to-speech conversion using Kokoro TTS\n- **Multi-Speaker Support:** Generate audio with different voices for different speakers in dialogues\n- **Profile Management:** Create and manage profiles with customizable voice and volume settings\n- **File Management:** Upload, store, and organize files within profiles\n- **RESTful API:** FastAPI backend endpoints for file processing and audio generation\n- **Modern Interface:** Vue.js frontend with Vuetify components for a responsive design\n\n## Project Structure\n\n```\ntorchts/\n├── requirements.txt           # Python dependencies\n├── src/\n│   ├── backend/              # Python backend\n│   │   ├── api/             # API endpoints and routing\n│   │   ├── storage/         # Database models and storage\n│   │   ├── processing/      # Text and audio processing\n│   │   └── main.py         # Main entry point\n│   └── frontend/            # Frontend applications\n│       └── templates/\n│           └── vue/        # Vue.js application\n```\n\n## Installation\n\n### Quick Start (Docker)\n\n1. Clone the repository\n\n2. Run the application:\n   ```bash\n   docker compose up -d\n   ```\n\n   To run with CUDA use:\n   ```bash\n   docker compose -f docker-compose.cuda.yml up -d\n   ```\n\n3. Access the web interface at `http://localhost:5173`\n\nThat's it! Docker will automatically set up everything needed.\n\n### Development Setup\n\n#### Prerequisites\n- Python 3.11+\n- Node.js 18+\n- npm 9+\n- espeak-ng (macOS only)\n\nFor local development on macOS, install espeak-ng:\n```bash\nbrew install espeak-ng\n```\n\n#### Backend Setup (Python)\n\n1. Create and activate a virtual environment (recommended):\n   ```bash\n   python -m venv .venv\n   source .venv/bin/activate  # On Unix/macOS\n   # or\n   .venv\\Scripts\\activate     # On Windows\n   ```\n2. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n3. Start the backend server:\n   ```bash\n   python src/backend/main.py\n   ```\n\n#### Frontend Setup (Vue.js)\n\n1. Navigate to the Vue directory:\n   ```bash\n   cd src/frontend/templates/vue\n   ```\n2. Install dependencies:\n   ```bash\n   npm install\n   ```\n3. Start development server:\n   ```bash\n   npm run dev\n   ```\n\n## Usage\n\n1. Access the web interface at `http://localhost:5173` after starting both the backend and frontend servers.\n2. Create a profile by clicking \"Create New Profile\" and setting your preferred voice and volume settings.\n3. Upload text or documents to your profile using the file upload area.\n4. Click on any uploaded file to load its content into the text editor.\n5. Choose between Single Speaker or Multi Speaker mode:\n   - **Single Speaker:** Select one voice for the entire text\n   - **Multi Speaker:** Use multiple voices for different speakers in dialogues\n6. Adjust voice settings if needed and click \"Convert to Speech\" to generate audio.\n7. Use the profile settings (cogwheel icon) to manage your files and profile.\n\n### Multi-Speaker Mode\n\nIn Multi-Speaker mode, you can assign different voices to different speakers in your text. Use the following format:\n\n```\n\u003e\u003e\u003e 1 Hello everyone! This is the first speaker.\n\u003e\u003e\u003e 2 And I'm the second speaker!\n\u003e\u003e\u003e 1 We can have a conversation like this.\n```\n\nEach speaker is identified by a number (\u003e\u003e\u003e 1, \u003e\u003e\u003e 2, etc.) and can be assigned a different voice using the voice selection dropdown menus.\n\n### Keyboard Controls\n\n- **Space**: Play/Pause audio\n- **←/→**: Seek backward/forward 5 seconds\n- **↑/↓**: Increase/decrease volume by 5%\n\n### Profile Management\n\n- **Create Profile:** Set up profiles with custom voice presets and volume settings\n- **Upload Files:** Each profile maintains its own collection of uploaded files\n- **File Organization:** Files are stored per profile for better organization\n- **Profile Settings:** Access profile settings via the cogwheel icon to:\n  - Delete all files in the profile\n  - Delete the entire profile and its associated files\n\n## Contributing\n\nFeel free to open issues or submit pull requests if you'd like to contribute to the project.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Acknowledgments\n\n- This project relies heavily on the [Kokoro-82M](https://github.com/hexgrad/kokoro) text-to-speech model created by [hexgrad](https://huggingface.co/hexgrad/Kokoro-82M). Their work on developing this high-quality TTS model made this project possible.\n- Built with FastAPI, Vue.js, and Vuetify\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexykn%2Ftorchts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexykn%2Ftorchts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexykn%2Ftorchts/lists"}