{"id":26041297,"url":"https://github.com/muhtasham/text-to-speech-gemini","last_synced_at":"2026-02-20T19:33:44.063Z","repository":{"id":280784187,"uuid":"942332115","full_name":"Muhtasham/text-to-speech-gemini","owner":"Muhtasham","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-04T00:14:29.000Z","size":1496,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-12T23:20:35.660Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Muhtasham.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-04T00:12:10.000Z","updated_at":"2025-03-05T07:00:40.000Z","dependencies_parsed_at":"2025-03-05T09:45:43.660Z","dependency_job_id":"e58efe3b-7778-4399-bdde-b85f7af74b4e","html_url":"https://github.com/Muhtasham/text-to-speech-gemini","commit_stats":null,"previous_names":["muhtasham/text-to-speech-gemini"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Muhtasham/text-to-speech-gemini","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muhtasham%2Ftext-to-speech-gemini","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muhtasham%2Ftext-to-speech-gemini/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muhtasham%2Ftext-to-speech-gemini/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muhtasham%2Ftext-to-speech-gemini/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Muhtasham","download_url":"https://codeload.github.com/Muhtasham/text-to-speech-gemini/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muhtasham%2Ftext-to-speech-gemini/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29661615,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T16:33:43.953Z","status":"ssl_error","status_checked_at":"2026-02-20T16:33:43.598Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-07T13:51:55.035Z","updated_at":"2026-02-20T19:33:44.048Z","avatar_url":"https://github.com/Muhtasham.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text-to-Speech Implementation using Multimodal Live API\n\nThis project extends the [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console) to implement text-to-speech functionality using Gemini's audio capabilities. The implementation is primarily in `src/components/text-to-speech/TextToSpeech.tsx`.\n\n## Features\n\n- **Text-to-Speech Conversion**: Convert any text to natural-sounding speech\n- **Multiple Voice Options**: Choose from different voices:\n  - Puck\n  - Charon\n  - Kore\n  - Fenrir\n  - Aoede\n- **Customizable Prompts**: Modify how the AI processes your text\n- **Real-time Audio Streaming**: Hear the speech as it's being generated\n- **Audio Download**: Save generated speech as WAV files\n- **Error Handling**: Robust error handling with retry mechanisms\n\n## Quick Start\n\n1. Get your [Gemini API key](https://aistudio.google.com/apikey)\n2. Set up the project:\n```bash\n# Clone the repository\ngit clone https://github.com/Muhtasham/text-to-speech-gemini.git\ncd text-to-speech-gemini\n\n# Install dependencies\nnpm install\n\n# Create .env file and add your API key\necho \"GEMINI_API_KEY=your_api_key_here\" \u003e .env\n\n# Start the development server\nnpm start\n```\n\n## Using the Text-to-Speech Component\n\n1. Open http://localhost:3000 in your browser\n2. Click \"Show Settings\" to access:\n   - Voice selection dropdown\n   - Custom prompt configuration\n3. Enter your text in the main textarea\n4. Click \"Speak\" to generate and play the audio\n5. Use \"Download Audio\" to save as WAV file\n\n## Implementation Details\n\nThe text-to-speech functionality is implemented in `TextToSpeech.tsx` with these key features:\n\n```typescript\n// Key components:\n- AudioStreamer for real-time audio playback\n- Voice selection from available options\n- Customizable prompts with default:\n  \"Please convert this text to speech and recite it verbatim do not start with sure here it is etc:\"\n- WAV file generation for downloads\n```\n\n## Original Project Attribution\n\nThis project is based on the [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console) by Google. The original project provides modules for streaming audio playback, recording user media, and a unified log view.\n\n## Development\n\nBuilt with:\n- React + TypeScript\n- Web Audio API\n- Gemini's Multimodal Live API\n- SCSS for styling\n\n## License\n\nThis project maintains the original Apache License 2.0 from the base project.\n\n---\n\n_This is an extension of an experiment showcasing the Multimodal Live API, not an official Google product. The original disclaimer and terms apply. See [Google's policy](https://developers.google.com/terms/site-policies) for more information._\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuhtasham%2Ftext-to-speech-gemini","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmuhtasham%2Ftext-to-speech-gemini","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuhtasham%2Ftext-to-speech-gemini/lists"}