An open API service indexing awesome lists of open source software.

https://github.com/ericc-ch/edge-tts

Use Microsoft Edge's online text-to-speech service from JS code directly!
https://github.com/ericc-ch/edge-tts

reverse-engineering tts

Last synced: 7 months ago
JSON representation

Use Microsoft Edge's online text-to-speech service from JS code directly!

Awesome Lists containing this project

README

          

# Edge TTS

> A TypeScript library for generating speech using Microsoft Edge's text-to-speech API

Generate speech from text using Microsoft Edge's text-to-speech service. This library provides access to Edge's TTS capabilities with subtitle generation support and voice customization options.

## Installation

```bash
npm install @echristian/edge-tts
```

## CLI Usage

```bash
# List all available voices grouped by locale
npx @echristian/edge-tts voices

# Generate audio from text
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --voice en-US-AvaNeural

# Generate audio with subtitles
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --subtitle output.srt --voice en-US-AvaNeural
```

## API Usage

```typescript
import { synthesize, synthesizeStream, getVoices } from "@echristian/edge-tts";

// Get available voices
const voices = await getVoices();
console.log(voices); // Array of available voice options

// Basic usage with synthesize()
const { audio, subtitle } = await synthesize({
text: "Hello, world!",
});

// Stream processing usage
const generator = synthesizeStream({ text: "Hello world" });
for await (const chunk of generator) {
// chunk is a Uint8Array of raw audio data
// Process or save each chunk as needed
}

// Collecting all streamed chunks
const chunks: Uint8Array[] = [];
for await (const chunk of synthesizeStream({ text: "Hello world" })) {
chunks.push(chunk);
}
```

## API

### getVoices(): Promise>

Returns an array of available voices with their properties.

#### Voice Object

| Property | Type | Description |
| ------------ | ------ | ------------------------------ |
| Name | string | Full name of the voice |
| ShortName | string | Short identifier for the voice |
| Gender | string | Voice gender (Male/Female) |
| Locale | string | Language code and region |
| FriendlyName | string | Display name for the voice |

### synthesize(options): Promise

Main function to generate speech from text.

### synthesizeStream(options): AsyncGenerator

Creates an async generator that yields chunks of processed audio data. Each chunk has metadata headers automatically removed.

Uses the same options as `synthesize()`, but without subtitle support:

| Option | Type | Default | Description |
| ------------ | ------ | --------------------------------- | ------------------------- |
| text | string | (required) | Text to convert to speech |
| voice | string | "en-US-AvaNeural" | Voice ID to use |
| language | string | "en-US" | Language code |
| outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format |
| rate | string | "default" | Speaking rate |
| pitch | string | "default" | Voice pitch |
| volume | string | "default" | Audio volume |

For detailed configuration options, refer to Microsoft's documentation:

- [Available voices and language support](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts)
- [Audio output formats](https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisoutputformat?view=azure-dotnet)
- [Pitch, rate, and volumes](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice)

Note: Some options may be limited by Microsoft Edge's service capabilities.

#### GenerateOptions

| Option | Type | Default | Description |
| ------------ | --------------- | ------------------------------------ | ------------------------- |
| text | string | (required) | Text to convert to speech |
| voice | string | "en-US-AvaNeural" | Voice ID to use |
| language | string | "en-US" | Language code |
| outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format |
| rate | string | "default" | Speaking rate |
| pitch | string | "default" | Voice pitch |
| volume | string | "default" | Audio volume |
| subtitle | SubtitleOptions | { splitBy: "word", wordsPerCue: 10 } | Subtitle options |

#### SubtitleOptions

| Option | Type | Default | Description |
| -------------- | -------------------- | ------- | ------------------------------------ |
| splitBy | "word" \| "duration" | "word" | How to split subtitles |
| wordsPerCue | number | 10 | Words per subtitle when using 'word' |
| durationPerCue | number | 5000 | Duration (ms) when using 'duration' |

#### GenerateResult

| Property | Type | Description |
| -------- | --------------------- | -------------------- |
| audio | Blob | Generated audio data |
| subtitle | Array | Generated subtitles |

#### SubtitleResult

| Property | Type | Description |
| -------- | ------ | --------------- |
| text | string | Subtitle text |
| start | number | Start time (ms) |
| end | number | End time (ms) |
| duration | number | Duration (ms) |