{"id":30139553,"url":"https://github.com/dfanso/gemini-tts","last_synced_at":"2025-08-11T02:15:27.789Z","repository":{"id":300879018,"uuid":"1002260004","full_name":"DFanso/gemini-tts","owner":"DFanso","description":"Text-to-Speech API using Google Gemini with Background Job Processing","archived":false,"fork":false,"pushed_at":"2025-06-24T03:28:45.000Z","size":1074,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-24T04:30:33.563Z","etag":null,"topics":["gemini","gemini-tts","tts-api"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DFanso.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-15T04:22:46.000Z","updated_at":"2025-06-24T03:28:49.000Z","dependencies_parsed_at":"2025-06-24T04:30:56.893Z","dependency_job_id":"64e84821-75b2-4afa-9d35-bfeb6481975e","html_url":"https://github.com/DFanso/gemini-tts","commit_stats":null,"previous_names":["dfanso/gemini-tts"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DFanso/gemini-tts","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DFanso%2Fgemini-tts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DFanso%2Fgemini-tts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DFanso%2Fgemini-tts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DFanso%2Fgemini-tts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DFanso","download_url":"https://codeload.github.com/DFanso/gemini-tts/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DFanso%2Fgemini-tts/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269819032,"owners_count":24480087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-11T02:00:10.019Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gemini","gemini-tts","tts-api"],"created_at":"2025-08-11T02:15:02.413Z","updated_at":"2025-08-11T02:15:27.766Z","avatar_url":"https://github.com/DFanso.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gemini TTS - Text-to-Speech with Google Gemini API\n\nA TypeScript application that converts text to speech using Google's Gemini API with native TTS capabilities.\n\n## Features\n\n- 🎤 Convert text files to high-quality speech audio\n- 🎵 30 different voice options available\n- 🌍 Supports multiple languages (including Sinhala)\n- 📁 Easy file-based input\n- 🔧 TypeScript for better development experience\n- ⚡ Uses the latest Gemini 2.5 TTS models\n\n## Prerequisites\n\n- Node.js 18 or later\n- pnpm (recommended) or npm\n- A Google Gemini API key\n\n## Installation\n\n1. Clone this repository or download the files\n2. Install dependencies:\n\n```bash\npnpm install\n```\n\nOr if you prefer npm:\n```bash\nnpm install\n```\n\n3. Set up your environment variables:\n\nCreate a `.env` file in the root directory:\n\n```env\nGEMINI_API_KEY=your_gemini_api_key_here\n```\n\n**Get your API key from:** https://aistudio.google.com/app/apikey\n\n## Usage\n\n### Quick Start\n\n1. Make sure your text is in the `text.txt` file (it's already there with Sinhala text)\n2. Run the application:\n\n```bash\n# Development mode (with ts-node)\npnpm dev\n\n# Or build and run\npnpm build\npnpm start\n```\n\n3. The audio file will be generated as `sinhala_text_audio.wav`\n\n### Available Voices\n\nThe application supports 30 different voices:\n\n- **Bright**: Zephyr, Autonoe\n- **Upbeat**: Puck, Laomedeia  \n- **Firm**: Kore, Orus, Alnilam\n- **Informative**: Charon, Rasalgethi\n- **Excitable**: Fenrir\n- **Youthful**: Leda\n- **Easy-going**: Umbriel, Callirrhoe\n- **Clear**: Erinome, Iapetus\n- **Breezy**: Aoede\n- **Breathy**: Enceladus\n- **Smooth**: Algieba, Despina\n- **Gravelly**: Algenib\n- **Soft**: Achernar\n- **Mature**: Gacrux\n- **Casual**: Zubenelgenubi\n- **Forward**: Pulcherrima\n- **Even**: Schedar\n- **Friendly**: Achird\n- **Lively**: Sadachbia\n- **Knowledgeable**: Sadaltager\n- **Gentle**: Vindemiatrix\n- **Warm**: Sulafat\n\n### Customizing Voice and Output\n\nYou can modify the voice and output filename in `src/index.ts`:\n\n```typescript\nawait tts.convertFileToSpeech('text.txt', {\n  voiceName: 'Puck', // Change to any available voice\n  outputFile: 'my_custom_audio.wav'\n});\n```\n\n## Supported Languages\n\nThe Gemini TTS API supports 24 languages including:\n\n- English (US, India)\n- Arabic (Egyptian)\n- German, Spanish, French\n- Hindi, Indonesian, Italian\n- Japanese, Korean\n- Portuguese (Brazil)\n- Russian, Dutch, Polish\n- Thai, Turkish, Vietnamese\n- Romanian, Ukrainian\n- Bengali, Marathi, Tamil, Telugu\n\n*Note: While Sinhala isn't officially listed, the API may auto-detect and process it.*\n\n## Project Structure\n\n```\ngemini-tts/\n├── src/\n│   └── index.ts          # Main TypeScript application\n├── dist/                 # Compiled JavaScript (after build)\n├── text.txt             # Input text file (Sinhala content)\n├── package.json         # Dependencies and scripts\n├── tsconfig.json        # TypeScript configuration\n└── README.md           # This file\n```\n\n## Scripts\n\n- `pnpm dev` - Run in development mode with ts-node\n- `pnpm build` - Compile TypeScript to JavaScript\n- `pnpm start` - Run the compiled JavaScript\n- `pnpm clean` - Clean the dist directory\n\n## API Usage\n\nThe application uses the Gemini 2.5 Flash Preview TTS model:\n\n```typescript\nconst response = await this.ai.models.generateContent({\n  model: \"gemini-2.5-flash-preview-tts\",\n  contents: [{ \n    role: \"user\", \n    parts: [{ text: `Please read this text aloud: ${text}` }] \n  }],\n  config: {\n    responseModalities: ['AUDIO'],\n    speechConfig: {\n      voiceConfig: {\n        prebuiltVoiceConfig: { \n          voiceName: 'Kore' \n        }\n      }\n    }\n  }\n});\n```\n\n## Error Handling\n\nThe application includes comprehensive error handling for:\n\n- Missing API keys\n- File reading errors\n- API response errors\n- Audio file saving errors\n\n## Limitations\n\n- TTS models accept text-only inputs\n- Context window limit of 32k tokens\n- Audio output is in WAV format at 24kHz\n- Preview feature - may have usage limits\n\n## Troubleshooting\n\n### Common Issues\n\n1. **\"GEMINI_API_KEY is required\"**\n   - Make sure you've created a `.env` file with your API key\n   - Verify the API key is correct\n\n2. **\"Failed to read file text.txt\"**\n   - Ensure the `text.txt` file exists in the root directory\n   - Check file permissions\n\n3. **\"No audio data received from Gemini API\"**\n   - Check your API key has TTS permissions\n   - Verify the text isn't too long (32k token limit)\n\n## Contributing\n\nFeel free to submit issues and enhancement requests!\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## References\n\n- [Gemini API Documentation](https://ai.google.dev/gemini-api/docs/speech-generation)\n- [Google AI Studio](https://aistudio.google.com/)\n- [TypeScript Documentation](https://www.typescriptlang.org/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfanso%2Fgemini-tts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdfanso%2Fgemini-tts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfanso%2Fgemini-tts/lists"}