https://github.com/ericc-ch/edge-tts
Use Microsoft Edge's online text-to-speech service from JS code directly!
https://github.com/ericc-ch/edge-tts
reverse-engineering tts
Last synced: 7 months ago
JSON representation
Use Microsoft Edge's online text-to-speech service from JS code directly!
- Host: GitHub
- URL: https://github.com/ericc-ch/edge-tts
- Owner: ericc-ch
- License: mpl-2.0
- Created: 2024-08-23T09:04:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-07T14:00:01.000Z (about 1 year ago)
- Last Synced: 2025-04-09T09:51:43.028Z (12 months ago)
- Topics: reverse-engineering, tts
- Language: TypeScript
- Homepage: https://npm.im/@echristian/edge-tts
- Size: 231 KB
- Stars: 12
- Watchers: 1
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Edge TTS
> A TypeScript library for generating speech using Microsoft Edge's text-to-speech API
Generate speech from text using Microsoft Edge's text-to-speech service. This library provides access to Edge's TTS capabilities with subtitle generation support and voice customization options.
## Installation
```bash
npm install @echristian/edge-tts
```
## CLI Usage
```bash
# List all available voices grouped by locale
npx @echristian/edge-tts voices
# Generate audio from text
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --voice en-US-AvaNeural
# Generate audio with subtitles
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --subtitle output.srt --voice en-US-AvaNeural
```
## API Usage
```typescript
import { synthesize, synthesizeStream, getVoices } from "@echristian/edge-tts";
// Get available voices
const voices = await getVoices();
console.log(voices); // Array of available voice options
// Basic usage with synthesize()
const { audio, subtitle } = await synthesize({
text: "Hello, world!",
});
// Stream processing usage
const generator = synthesizeStream({ text: "Hello world" });
for await (const chunk of generator) {
// chunk is a Uint8Array of raw audio data
// Process or save each chunk as needed
}
// Collecting all streamed chunks
const chunks: Uint8Array[] = [];
for await (const chunk of synthesizeStream({ text: "Hello world" })) {
chunks.push(chunk);
}
```
## API
### getVoices(): Promise>
Returns an array of available voices with their properties.
#### Voice Object
| Property | Type | Description |
| ------------ | ------ | ------------------------------ |
| Name | string | Full name of the voice |
| ShortName | string | Short identifier for the voice |
| Gender | string | Voice gender (Male/Female) |
| Locale | string | Language code and region |
| FriendlyName | string | Display name for the voice |
### synthesize(options): Promise
Main function to generate speech from text.
### synthesizeStream(options): AsyncGenerator
Creates an async generator that yields chunks of processed audio data. Each chunk has metadata headers automatically removed.
Uses the same options as `synthesize()`, but without subtitle support:
| Option | Type | Default | Description |
| ------------ | ------ | --------------------------------- | ------------------------- |
| text | string | (required) | Text to convert to speech |
| voice | string | "en-US-AvaNeural" | Voice ID to use |
| language | string | "en-US" | Language code |
| outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format |
| rate | string | "default" | Speaking rate |
| pitch | string | "default" | Voice pitch |
| volume | string | "default" | Audio volume |
For detailed configuration options, refer to Microsoft's documentation:
- [Available voices and language support](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts)
- [Audio output formats](https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisoutputformat?view=azure-dotnet)
- [Pitch, rate, and volumes](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice)
Note: Some options may be limited by Microsoft Edge's service capabilities.
#### GenerateOptions
| Option | Type | Default | Description |
| ------------ | --------------- | ------------------------------------ | ------------------------- |
| text | string | (required) | Text to convert to speech |
| voice | string | "en-US-AvaNeural" | Voice ID to use |
| language | string | "en-US" | Language code |
| outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format |
| rate | string | "default" | Speaking rate |
| pitch | string | "default" | Voice pitch |
| volume | string | "default" | Audio volume |
| subtitle | SubtitleOptions | { splitBy: "word", wordsPerCue: 10 } | Subtitle options |
#### SubtitleOptions
| Option | Type | Default | Description |
| -------------- | -------------------- | ------- | ------------------------------------ |
| splitBy | "word" \| "duration" | "word" | How to split subtitles |
| wordsPerCue | number | 10 | Words per subtitle when using 'word' |
| durationPerCue | number | 5000 | Duration (ms) when using 'duration' |
#### GenerateResult
| Property | Type | Description |
| -------- | --------------------- | -------------------- |
| audio | Blob | Generated audio data |
| subtitle | Array | Generated subtitles |
#### SubtitleResult
| Property | Type | Description |
| -------- | ------ | --------------- |
| text | string | Subtitle text |
| start | number | Start time (ms) |
| end | number | End time (ms) |
| duration | number | Duration (ms) |