https://github.com/ark018/multi-voice-sdk

A universal Text-to-Speech (TTS) SDK . Easily generate and manage audio content with a unified API.
https://github.com/ark018/multi-voice-sdk

deepgram gemini npm-package openai tts tts-api

Last synced: about 2 months ago
JSON representation

A universal Text-to-Speech (TTS) SDK . Easily generate and manage audio content with a unified API.

Host: GitHub
URL: https://github.com/ark018/multi-voice-sdk
Owner: ARK018
License: isc
Created: 2025-06-06T10:55:11.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-21T11:51:35.000Z (about 1 year ago)
Last Synced: 2025-06-21T12:31:56.377Z (about 1 year ago)
Topics: deepgram, gemini, npm-package, openai, tts, tts-api
Language: JavaScript
Homepage: https://www.npmjs.com/package/multi-voice-sdk
Size: 44.9 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Multi-Voice SDK

A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK that supports multiple providers including Google Gemini, Deepgram, OpenAI, Groq PlayAI, Cartesia, and AssemblyAI. Easily generate audio content, transcribe speech, and manage audio files with a unified API.

## Features

- 🎵 **Multi-Provider TTS**: Gemini, Deepgram, OpenAI, Groq PlayAI, and Cartesia TTS

- 🎙️ **Speech-to-Text**: Deepgram and AssemblyAI STT with advanced features

- 🔧 **Audio Merging**: Combine multiple audio files seamlessly

- 🎯 **Simple API**: Easy-to-use functions with consistent interface

- 📦 **ESM Ready**: Modern ES modules support

## Installation

```bash

npm install multi-voice-sdk

```

## Quick Start

```javascript

import { tts, stt, merge } from "multi-voice-sdk";

// Generate speech with OpenAI

tts({

  provider: "openai",

  apiKey: "your-api-key",

  text: "Hello, world!",

  voice: "nova",

  outputFile: "output.mp3",

});

// Transcribe audio with Deepgram

stt({

  apiKey: "your-deepgram-key",

  audioFile: "https://example.com/audio.wav", // Can be URL or local file

});

// Merge multiple audio files

merge({

  inputFiles: ["file1.mp3", "file2.mp3"],

  outputFile: "combined.mp3",

});

```

## API Reference

### `tts(options)`

Generate speech from text using various TTS providers.

#### Parameters

| Parameter    | Type     | Required | Description                                                                   |

| ------------ | -------- | -------- | ----------------------------------------------------------------------------- |

| `provider`   | `string` | ✅       | TTS provider: `"gemini"`, `"deepgram"`, `"openai"`, `"groq"`, or `"cartesia"` |

| `apiKey`     | `string` | ✅       | API key for the chosen provider                                               |

| `text`       | `string` | ✅       | Text to convert to speech                                                     |

| `voice`      | `string` | ✅       | Voice identifier (provider-specific, for Cartesia use voice ID)               |

| `outputFile` | `string` | optional | Output file path (default: `"output.mp3"`)                                    |

| `model`      | `string` | optional | Model to use (provider-specific)                                              |

| `prompt`     | `string` | optional | Additional instructions for speech generation                                 |

#### Examples

**OpenAI TTS**

```javascript

tts({

  provider: "openai",

  apiKey: process.env.OPENAI_API_KEY,

  model: "gpt-4o-mini-tts",

  text: "Hello from OpenAI!",

  voice: "nova",

  prompt: "Speak in a cheerful tone",

  outputFile: "openai_output.mp3",

});

```

**Google Gemini TTS**

```javascript

tts({

  provider: "gemini",

  apiKey: process.env.GEMINI_API_KEY,

  text: "Hello from Gemini!",

  voice: "iapetus",

  prompt: "In a pleasant and calm tone",

  outputFile: "gemini_output.mp3",

});

```

**Deepgram TTS**

```javascript

tts({

  provider: "deepgram",

  apiKey: process.env.DEEPGRAM_API_KEY,

  text: "Hello from Deepgram!",

  voice: "aura-2-luna-en",

  outputFile: "deepgram_output.mp3",

});

```

**Groq PlayAI TTS**

```javascript

tts({

  provider: "groq",

  apiKey: process.env.GROQ_API_KEY,

  text: "Hello from Groq PlayAI!",

  voice: "Arista-PlayAI",

  outputFile: "groq_output.wav",

});

```

**Cartesia TTS**

```javascript

tts({

  provider: "cartesia",

  apiKey: process.env.CARTESIA_API_KEY,

  text: "Hello from Cartesia!",

  voice: "694f9389-aac1-45b6-b726-9d9369183238", // Voice ID

  outputFile: "cartesia_output.mp3",

});

```

### `stt(options)`

Transcribe audio to text using Speech-to-Text providers.

#### Parameters

| Parameter         | Type      | Required | Description                                                               |

| ----------------- | --------- | -------- | ------------------------------------------------------------------------- |

| `provider`        | `string`  | ✅       | STT provider: `"deepgram"` or `"assemblyai"`                              |

| `apiKey`          | `string`  | ✅       | API key for the chosen provider                                           |

| `audioFile`       | `string`  | ✅       | Path to local audio file or URL of remote audio file to transcribe        |

| `outputFile`      | `string`  | optional | Output file path for results (default: `"transcription.json"`)            |

| `model`           | `string`  | optional | Model to use (default: `"nova-3"`)                                        |

| `smartFormat`     | `boolean` | optional | Enable smart formatting (default: `true`)                                 |

| `detect_language` | `boolean` | optional | Automatic language detection (default: `true`)                            |

| `punctuate`       | `boolean` | optional | Enable punctuation (default: `true`)                                      |

| `diarize`         | `boolean` | optional | Enable speaker diarization (default: `false`)                             |

| `channels`        | `number`  | optional | Number of audio channels (default: `1`)                                   |

| `fullResponse`    | `boolean` | optional | Return full response object instead of just transcript (default: `false`) |

#### Returns

- **Default**: Returns transcript as a string

- **With `fullResponse: true`**: Returns object with transcript, confidence, words, and metadata

#### Examples

### `Deepgram : Basic Transcription (Remote URL)`

```javascript

stt({

  provider: "deepgram",

  apiKey: process.env.DEEPGRAM_API_KEY,

  audioFile: "https://example.com/audio.wav", // Remote URL

});

```

### `Deepgram : Local File Transcription`

```javascript

stt({

  provider: "deepgram",

  apiKey: process.env.DEEPGRAM_API_KEY,

  audioFile: "./my-audio.mp3", // Local file path

  outputFile: "transcription.json",

});

```

### `AssemblyAI : Basic Transcription (Remote URL)`

```javascript

stt({

  provider: "assemblyai",

  apiKey: process.env.ASSEMBLYAI_API_KEY,

  audioFile: "https://example.com/audio.wav", // Remote URL

  outputFile: "transcription.json",

});

```

### `AssemblyAI : Local File Transcription`

```javascript

stt({

  provider: "assemblyai",

  apiKey: process.env.ASSEMBLYAI_API_KEY,

  audioFile: "./my-audio.mp3", // Local file path

  outputFile: "transcription.json",

  fullResponse: true, // Get detailed response

});

```

### `merge(options)`

Merge multiple audio files into a single file.

#### Parameters

| Parameter    | Type       | Required | Description               |

| ------------ | ---------- | -------- | ------------------------- |

| `inputFiles` | `string[]` | ✅       | Array of input file paths |

| `outputFile` | `string`   | ✅       | Output file path          |

#### Example

```javascript

merge({

  inputFiles: ["intro.mp3", "main.mp3", "outro.mp3"],

  outputFile: "complete_audio.mp3",

});

```

## Supported Voices

### OpenAI

- `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, `shimmer`, `verse`

### Gemini

- `zephyr` (Bright), `puck` (Upbeat), `charon` (Informative), `kore` (Firm), `fenrir` (Excitable), `leda` (Youthful), `orus` (Firm), `aoede` (Breezy), `autonoe` (Bright), `enceladus` (Breathy), `iapetus` (Clear)

For a complete list of available Gemini voices, see: [Gemini Speech Generation Documentation](https://ai.google.dev/gemini-api/docs/speech-generation#voices)

### Deepgram

- `aura-2-luna-en`, `aura-2-stella-en`, `aura-2-arcas-en`, and more

For a complete list of available Deepgram voices, see: [Deepgram TTS Models Documentation](https://developers.deepgram.com/docs/tts-models#featured-voices)

### Groq PlayAI

- `Atlas-PlayAI`, `Arista-PlayAI`, `Basil-PlayAI`, `Briggs-PlayAI`, and more

For a complete list of available Groq PlayAI voices, see: [Groq TTS Documentation](https://console.groq.com/docs/text-to-speech)

### Cartesia

Cartesia uses voice IDs instead of voice names. Example voice IDs:

- `694f9389-aac1-45b6-b726-9d9369183238` (Default voice)

- Use the Cartesia console to find available voice IDs for your account

For more information about Cartesia voices, see: [Cartesia Console](https://play.cartesia.ai/voices)

## Environment Variables

Create a `.env` file in your project root:

```env

OPENAI_API_KEY=your_openai_api_key

GEMINI_API_KEY=your_gemini_api_key

DEEPGRAM_API_KEY=your_deepgram_api_key

GROQ_API_KEY=your_groq_api_key

CARTESIA_API_KEY=your_cartesia_api_key

```

## Requirements

- Node.js 16.x or higher

## License

ISC

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ark018/multi-voice-sdk

Awesome Lists containing this project

README