https://github.com/osteele/speech-provider
A unified TypeScript interface for browser speech synthesis and Eleven Labs TTS voices
https://github.com/osteele/speech-provider
browser eleven-labs speech-synthesis text-to-speech tts typescript voice
Last synced: 4 months ago
JSON representation
A unified TypeScript interface for browser speech synthesis and Eleven Labs TTS voices
- Host: GitHub
- URL: https://github.com/osteele/speech-provider
- Owner: osteele
- License: mit
- Created: 2025-03-27T19:46:10.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-11-08T07:22:59.000Z (8 months ago)
- Last Synced: 2025-11-08T08:25:26.554Z (8 months ago)
- Topics: browser, eleven-labs, speech-synthesis, text-to-speech, tts, typescript, voice
- Language: TypeScript
- Homepage: https://osteele.github.io/speech-provider/
- Size: 95.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# speech-provider
A unified interface for browser speech synthesis and Eleven Labs voices.
## Installation
```bash
# Using npm
npm install speech-provider
# Using yarn
yarn add speech-provider
# Using bun
bun add speech-provider
```
## Documentation
Full API documentation is available at [https://osteele.github.io/speech-provider/](https://osteele.github.io/speech-provider/).
## Usage
```typescript
import { getVoiceProvider } from 'speech-provider';
// Use browser voices only
const provider = getVoiceProvider({});
// Use Eleven Labs voices if API key is available
const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key' });
// Use Eleven Labs with custom cache duration
const provider = getVoiceProvider({
elevenLabsApiKey: 'your-api-key',
cacheMaxAge: 86400 // Cache for 1 day
});
// Get voices for a specific language
const voices = await provider.getVoices({ lang: 'en-US', minVoices: 1 });
// Get default voice for a language
const defaultVoice = await provider.getDefaultVoice({ lang: 'en-US' });
// Create and play an utterance
if (defaultVoice) {
const utterance = defaultVoice.createUtterance('Hello, world!');
utterance.onstart = () => console.log('Started speaking');
utterance.onend = () => console.log('Finished speaking');
utterance.start();
}
```
## Features
- Unified interface for both browser speech synthesis and Eleven Labs voices
- Automatic fallback to browser voices when Eleven Labs API key is not provided
- Typesafe API with TypeScript support
- Simple voice selection by language
- Event listeners for speech start and end events
- Automatic caching of Eleven Labs API responses to reduce API calls
- Configurable cache duration for Eleven Labs responses
## Used In
This package is used in [Mandarin Sentence
Practice](https://mandarin-sentence-practice.osteele.com), a web application for
practicing Mandarin Chinese with listening and translation exercises. The app
uses this package to provide high-quality text-to-speech for Mandarin sentences,
with automatic fallback to browser voices when Eleven Labs is not available.
## API
### `getVoiceProvider(options)`
Creates a voice provider based on the available API keys. Falls back to browser speech synthesis if no API keys are provided.
```typescript
function getVoiceProvider(options: {
elevenLabsApiKey?: string | null;
cacheMaxAge?: number; // Cache duration in seconds (default: 1 hour)
}): VoiceProvider;
```
### `createElevenLabsVoiceProvider(apiKey, options?)`
Creates an Eleven Labs voice provider with optional configuration.
```typescript
function createElevenLabsVoiceProvider(
apiKey: string,
options?: {
validateResponses?: boolean;
printVoiceProperties?: boolean;
cacheMaxAge?: number; // Cache duration in seconds (default: 1 hour)
}
): VoiceProvider;
```
### Caching
The library implements automatic caching for Eleven Labs API responses:
- Browser voices are cached automatically by the browser's speech synthesis engine
- Eleven Labs responses are cached using IndexedDB with a default duration of 1 hour
- Cache duration can be configured when creating the provider
- Cached responses are automatically invalidated after the specified duration
- Cache can be disabled by setting `cacheMaxAge: null` in the provider options
Examples of cache configuration:
```typescript
// Use default 1-hour cache
const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key' });
// Cache for 1 day
const provider = getVoiceProvider({
elevenLabsApiKey: 'your-api-key',
cacheMaxAge: 86400 // 24 hours in seconds
});
// Cache for 1 week
const provider = getVoiceProvider({
elevenLabsApiKey: 'your-api-key',
cacheMaxAge: 604800 // 7 days in seconds
});
// Disable caching (preferred approach)
const provider = getVoiceProvider({
elevenLabsApiKey: 'your-api-key',
cacheMaxAge: null
});
// Alternative way to disable caching
const provider = getVoiceProvider({
elevenLabsApiKey: 'your-api-key',
cacheMaxAge: 0
});
```
### `VoiceProvider` Interface
```typescript
interface VoiceProvider {
name: string;
getVoices({ lang, minVoices }: { lang: string; minVoices: number }): Promise;
getDefaultVoice({ lang }: { lang: string }): Promise;
}
```
### `Voice` Interface
```typescript
interface Voice {
name: string;
id: string;
lang: string;
provider: VoiceProvider;
description: string | null;
createUtterance(text: string): Utterance;
}
```
### `Utterance` Interface
```typescript
interface Utterance {
start(): void;
stop(): void;
set onstart(callback: () => void);
set onend(callback: () => void);
}
```
## Browser Compatibility
### Browser Speech Synthesis
The browser speech synthesis provider (`BrowserVoiceProvider`) is supported in all modern browsers:
- **Chrome/Edge**: Full support (voices load asynchronously)
- **Firefox**: Full support
- **Safari**: Full support (iOS and macOS)
- **Opera**: Full support
**Note**: Voice availability and quality vary by browser and operating system. Chrome and Edge typically offer the best selection of voices.
### ElevenLabs Provider
The ElevenLabs provider (`ElevenLabsVoiceProvider`) requires:
- **IndexedDB**: For caching API responses (supported in all modern browsers)
- **Fetch API**: For making API requests (supported in all modern browsers)
- **Audio API**: For playing synthesized speech (supported in all modern browsers)
### Minimum Requirements
- Modern browser with ES2022 support
- IndexedDB support (for ElevenLabs caching)
- No Internet Explorer support
### Server-Side Rendering (SSR)
The library is designed for client-side use. When used in SSR environments:
- Browser voice provider gracefully handles the absence of `window.speechSynthesis`
- Returns empty arrays when browser APIs are unavailable
- Safe to import in SSR frameworks (Next.js, Nuxt, etc.) but should only be used client-side
## Contributing
Contributions are welcome! Please read the [CONTRIBUTING.md](CONTRIBUTING.md) guide for details on our code of conduct and the process for submitting pull requests.
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for a list of changes and version history.
## License
Copyright 2025 by Oliver Steele
Available under the MIT License