https://github.com/dfanso/gemini-tts
Text-to-Speech API using Google Gemini with Background Job Processing
https://github.com/dfanso/gemini-tts
gemini gemini-tts tts-api
Last synced: 11 months ago
JSON representation
Text-to-Speech API using Google Gemini with Background Job Processing
- Host: GitHub
- URL: https://github.com/dfanso/gemini-tts
- Owner: DFanso
- License: mit
- Created: 2025-06-15T04:22:46.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-24T03:28:45.000Z (about 1 year ago)
- Last Synced: 2025-06-24T04:30:33.563Z (about 1 year ago)
- Topics: gemini, gemini-tts, tts-api
- Language: TypeScript
- Homepage:
- Size: 1.02 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Gemini TTS - Text-to-Speech with Google Gemini API
A TypeScript application that converts text to speech using Google's Gemini API with native TTS capabilities.
## Features
- 🎤 Convert text files to high-quality speech audio
- 🎵 30 different voice options available
- 🌍 Supports multiple languages (including Sinhala)
- 📁 Easy file-based input
- 🔧 TypeScript for better development experience
- ⚡ Uses the latest Gemini 2.5 TTS models
## Prerequisites
- Node.js 18 or later
- pnpm (recommended) or npm
- A Google Gemini API key
## Installation
1. Clone this repository or download the files
2. Install dependencies:
```bash
pnpm install
```
Or if you prefer npm:
```bash
npm install
```
3. Set up your environment variables:
Create a `.env` file in the root directory:
```env
GEMINI_API_KEY=your_gemini_api_key_here
```
**Get your API key from:** https://aistudio.google.com/app/apikey
## Usage
### Quick Start
1. Make sure your text is in the `text.txt` file (it's already there with Sinhala text)
2. Run the application:
```bash
# Development mode (with ts-node)
pnpm dev
# Or build and run
pnpm build
pnpm start
```
3. The audio file will be generated as `sinhala_text_audio.wav`
### Available Voices
The application supports 30 different voices:
- **Bright**: Zephyr, Autonoe
- **Upbeat**: Puck, Laomedeia
- **Firm**: Kore, Orus, Alnilam
- **Informative**: Charon, Rasalgethi
- **Excitable**: Fenrir
- **Youthful**: Leda
- **Easy-going**: Umbriel, Callirrhoe
- **Clear**: Erinome, Iapetus
- **Breezy**: Aoede
- **Breathy**: Enceladus
- **Smooth**: Algieba, Despina
- **Gravelly**: Algenib
- **Soft**: Achernar
- **Mature**: Gacrux
- **Casual**: Zubenelgenubi
- **Forward**: Pulcherrima
- **Even**: Schedar
- **Friendly**: Achird
- **Lively**: Sadachbia
- **Knowledgeable**: Sadaltager
- **Gentle**: Vindemiatrix
- **Warm**: Sulafat
### Customizing Voice and Output
You can modify the voice and output filename in `src/index.ts`:
```typescript
await tts.convertFileToSpeech('text.txt', {
voiceName: 'Puck', // Change to any available voice
outputFile: 'my_custom_audio.wav'
});
```
## Supported Languages
The Gemini TTS API supports 24 languages including:
- English (US, India)
- Arabic (Egyptian)
- German, Spanish, French
- Hindi, Indonesian, Italian
- Japanese, Korean
- Portuguese (Brazil)
- Russian, Dutch, Polish
- Thai, Turkish, Vietnamese
- Romanian, Ukrainian
- Bengali, Marathi, Tamil, Telugu
*Note: While Sinhala isn't officially listed, the API may auto-detect and process it.*
## Project Structure
```
gemini-tts/
├── src/
│ └── index.ts # Main TypeScript application
├── dist/ # Compiled JavaScript (after build)
├── text.txt # Input text file (Sinhala content)
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
└── README.md # This file
```
## Scripts
- `pnpm dev` - Run in development mode with ts-node
- `pnpm build` - Compile TypeScript to JavaScript
- `pnpm start` - Run the compiled JavaScript
- `pnpm clean` - Clean the dist directory
## API Usage
The application uses the Gemini 2.5 Flash Preview TTS model:
```typescript
const response = await this.ai.models.generateContent({
model: "gemini-2.5-flash-preview-tts",
contents: [{
role: "user",
parts: [{ text: `Please read this text aloud: ${text}` }]
}],
config: {
responseModalities: ['AUDIO'],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: {
voiceName: 'Kore'
}
}
}
}
});
```
## Error Handling
The application includes comprehensive error handling for:
- Missing API keys
- File reading errors
- API response errors
- Audio file saving errors
## Limitations
- TTS models accept text-only inputs
- Context window limit of 32k tokens
- Audio output is in WAV format at 24kHz
- Preview feature - may have usage limits
## Troubleshooting
### Common Issues
1. **"GEMINI_API_KEY is required"**
- Make sure you've created a `.env` file with your API key
- Verify the API key is correct
2. **"Failed to read file text.txt"**
- Ensure the `text.txt` file exists in the root directory
- Check file permissions
3. **"No audio data received from Gemini API"**
- Check your API key has TTS permissions
- Verify the text isn't too long (32k token limit)
## Contributing
Feel free to submit issues and enhancement requests!
## License
MIT License - see LICENSE file for details.
## References
- [Gemini API Documentation](https://ai.google.dev/gemini-api/docs/speech-generation)
- [Google AI Studio](https://aistudio.google.com/)
- [TypeScript Documentation](https://www.typescriptlang.org/)