https://github.com/loglux/speakitai
Convert text to speech using Microsoft Azure Neural Text-to-Speech (TTS) and a simple Gradio web interface.
https://github.com/loglux/speakitai
ai ai-tools azure text-to-speech tts tts-engines
Last synced: 3 months ago
JSON representation
Convert text to speech using Microsoft Azure Neural Text-to-Speech (TTS) and a simple Gradio web interface.
- Host: GitHub
- URL: https://github.com/loglux/speakitai
- Owner: loglux
- Created: 2025-05-26T18:26:34.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-06-14T17:00:18.000Z (4 months ago)
- Last Synced: 2025-06-14T18:19:01.408Z (4 months ago)
- Topics: ai, ai-tools, azure, text-to-speech, tts, tts-engines
- Language: Python
- Homepage:
- Size: 256 KB
- Stars: 43
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SpeakItAI โ Neural Text-to-Speech with Azure & Gradio
Convert text to speech using Microsoft Azure Neural Text-to-Speech (TTS) and a simple Gradio web interface.
---
> โ ๏ธ **Note:**
> This is the classic Gradio version of SpeakItAI.
> For Docker support, FastAPI backend, and built-in user authentication, check out the [`fastapi-auth-docker`](https://github.com/loglux/SpeakItAI/tree/fastapi-auth-docker) branch!
>
> โ Quick start, multi-user, browser-based registration, and NAS/server deployment:
> [`fastapi-auth-docker`](https://github.com/loglux/SpeakItAI/tree/fastapi-auth-docker)
---


*A simple, interactive interface for converting your text to realistic speech.*
---## ๐ฏ Features
- Neural TTS with dynamic voice selection across 140+ Azure-supported languages
- Adjustable **speaking style**, **rate**, and **pitch**
- Input via **textbox** or upload a `.txt` file
- Output as **.wav** file, played directly in the browser
- Dropdowns now **auto-populate** with default language, voice, and style
- Human-readable **language names** in the UI (e.g., "English (UK)" instead of `en-GB`) โ falls back to code if name not defined
- ๐ง **Manage display names and visible languages directly in the UI** โ no need to edit files or restart the app
- Modular architecture โ ready for expansion
- ๐ **Audio Library** tab: browse, play, and delete generated audio files directly in the web interface
- Configurable audio output folder via `.env` variable `AUDIO_OUTPUT_DIR`
---## ๐ Azure Free Tier
Microsoft Azure offers **500,000 characters per month free** for **Neural Text-to-Speech** on the **F0 (free) pricing tier**.
- โก Billing is per character
- โ Free quota resets monthly
- ๐งช No credit card required to start๐ More info: [Azure Speech Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/)
---
## ๐ Language and Voice Support
Azure Neural TTS supports **140+ languages and dialects**, with many realistic male and female voices, including:
- ๐ฌ๐ง British English
- ๐บ๐ธ American English
- ๐ซ๐ท French
- ๐ฉ๐ช German
- ๐ท๐บ Russian
- ๐จ๐ณ Chinese
- ๐ช๐ธ Spanish
- ๐ฎ๐ณ Hindi
- ๐ And more๐ Full voice list and styles:
๐ [Azure Language & Voice Support](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support)---
## ๐ Setup Instructions
### 1. Clone the Repository
```bash
git clone https://github.com/loglux/SpeakItAI.git
cd SpeakItAI
```
### ๐ Azure Setup (Required)Before running this app, you need an active Azure Speech resource.
1. Go to the [Azure Portal](https://portal.azure.com/)
2. Create a **Speech** resource (Free F0 tier available)
3. Copy the **Key** and **Region** from the resource's "Keys and Endpoint" section
4. You will paste them into a `.env` file as shown below:### 2. Create `.env` File
```bash
cp .env.example .env
```Then fill in your Azure credentials:
```env
AZURE_KEY=your_azure_key
AZURE_REGION=your_azure_region
```> ๐ก Example region: `ukwest`, `eastus`, `westeurope`, etc.
---
### 3. Install Dependencies
Using virtual environment (recommended):
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```---
### 4. (Optional) Update Voice List from Azure
To fetch the latest voices and update `config.json`, run:
```bash
python tts/azure/update_config.py```
---
### 5. Run the App```bash
python app.py
```---
## ๐ Usage Notes
- If both textbox and file are provided, the file takes priority.
- Only `.txt` files are accepted for upload.
- The output is saved and played as a `.wav` file.
- If a voice does not support styles, "default" will be used automatically.
- All generated audio files are stored in the folder specified by `AUDIO_OUTPUT_DIR` in your `.env` file (default: `audio_outputs`).
- Use the **Audio Library** tab to play back or delete any generated audio file directly from the browser.---
## ๐ Project Structure
```
SpeakItAI/
โโโ app.py
โโโ .env.example
โโโ requirements.txt
โโโ README.md
โโโ audio_outputs/ # automatically created on runtime
โ
โโโ screenshots/
โ โโโ interface.png # UI preview
โ
โโโ tts/
โโโ __init__.py
โโโ base.py # optional: abstract provider interface
โโโ azure/
โโโ __init__.py
โโโ core.py # AzureTTS implementation
โโโ config.py # language label loading and utilities
โโโ config.json # auto-generated voice config from Azure
โโโ language_labels.json # editable language name mappings (UI dropdowns)
โโโ update_config.py # fetch and build config.json
```
---## ๐งฉ Configuration Notes
### Voice Configuration (`config.json`)
- Run `tts/azure/update_config.py` to fetch the latest Azure voice data.
- This generates a new `tts/azure/config.json` with all supported languages, voices, genders, and available styles.
- The app reads from `config.json` at runtime to populate the voice and style options.### Language Filter (`config.py`)
### Language Labels (`language_labels.json`)
- The list of **display names** for languages** is stored in a separate file:
`tts/azure/language_labels.json`.
- You can **add, rename, or delete** languages using the **โEdit Languagesโ** tab in the UI โ no need to edit files manually or restart the app.
- This list controls which languages appear in the dropdown, and how they are named (e.g., `"English (UK)"` instead of `"en-GB"`).#### Example structure:
```json
{
"en-GB": "English (UK)",
"ru-RU": "Russian",
"ar-KW": "Arabic (Kuwait)"
}
```#### Behaviour if the file is empty or missing:
- The app will display **all available languages** found in `config.json`.
- These will be shown using their **locale codes only** (e.g., `"fr-FR"`, `"hi-IN"`, `"de-DE"`), without readable labels.> ๐ก Use the โEdit Languagesโ tab to selectively re-add friendly names for only the languages you want to appear.
---
## ๐จ Architecture Note
The codebase is modular and ready for extension:
- Add new languages or accents in `tts/config.py`
- Replace the interface with FastAPI or Flask without touching `core.py`
- Support alternative providers like ElevenLabs, Bark, or Google Cloud TTS later---
## ๐ก License
This project is licensed under the MIT License.