An open API service indexing awesome lists of open source software.

https://github.com/dictate-button/dictate-button

Customizable Web Component that adds speech-to-text dictation capabilities to site text fields
https://github.com/dictate-button/dictate-button

button custom-element dictate dictate-button dictation speech-recognition speech-to-text transcribe transcribing transcription voice-recognition voice-to-text web-component whisper

Last synced: 5 months ago
JSON representation

Customizable Web Component that adds speech-to-text dictation capabilities to site text fields

Awesome Lists containing this project

README

          

# Dictate Button
[![NPM Version](https://img.shields.io/npm/v/dictate-button)](https://www.npmjs.com/package/dictate-button)
[![Tests](https://github.com/dictate-button/dictate-button/actions/workflows/test.yml/badge.svg)](https://github.com/dictate-button/dictate-button/actions/workflows/test.yml)

A customizable web component that adds speech-to-text dictation capabilities to any text input, textarea field, or contenteditable element on your website.

Developed for [dictate-button.io](https://dictate-button.io).

## Features

- Easy integration with any website
- Compatible with any framework (or no framework)
- Automatic injection into text fields with the `data-dictate-button-on` attribute (exclusive mode) or without the `data-dictate-button-off` attribute (inclusive mode)
- Simple speech-to-text functionality with clean UI
- Customizable size and API endpoint
- Dark and light theme support
- Event-based API for interaction with your application
- Built with SolidJS for optimal performance
- Accessibility is ensured with ARIA attributes, high-contrast mode support, and clear keyboard focus states

## Supported tags (by our inject scripts)

- textarea
- input[type="text"]
- input[type="search"]
- input (without a type; defaults to text)
- [contenteditable] elements

## Usage

### Auto-inject modes

Choose the auto-inject mode that best suits your needs:

| Mode | Description | Scripts |
|---|---|---|
| Exclusive | Enables for text fields with the `data-dictate-button-on` attribute only. | `inject-exclusive.js` |
| Inclusive | Enables for text fields without the `data-dictate-button-off` attribute. | `inject-inclusive.js` |

Both auto-inject modes:
- Automatically run on DOMContentLoaded (or immediately if the DOM is already loaded).
- Watch for DOM changes to apply the dictate button to newly added elements.
- Set the button’s language from `document.documentElement.lang` (if present). Long codes like `en-GB` are normalized to `en`.
- Position the button to the top right-hand corner of the text field, respecting its padding with 4px fallback if the padding is not set (0).

### From CDN

#### Option 1: Using the exclusive auto-inject script

In your HTML `` tag, add the following script tag:

```html

```

Add the `data-dictate-button-on` attribute to any `textarea`, `input[type="text"]`, `input[type="search"]`, `input` without a `type` attribute, or element with the `contenteditable` attribute:

```html


```

#### Option 2: Using the inclusive auto-inject script

In your HTML `` tag, add the following script tag:

```html

```

All `textarea`, `input[type="text"]`, `input[type="search"]`, `input` elements without a `type` attribute, and elements with the `contenteditable` attribute that lack `data-dictate-button-off` will be automatically enhanced by default.

To disable that for a specific field, add the `data-dictate-button-off` attribute to it this way:

```html


```

#### Option 3: Manual integration

Import the component and use it directly in your code:

```html

```

### From NPM

Import once for your app:

```js
// For selected text fields (with data-dictate-button-on attribute):
import 'dictate-button/inject-exclusive'
// or for all text fields (except those with data-dictate-button-off attribute):
import 'dictate-button/inject-inclusive'
```

To choose between **exclusive** and **inclusive** auto-inject modes, see the [Auto-inject modes](#auto-inject-modes) section.

### Advanced usage with library functions

If you need more control over when and how the dictate buttons are injected, you can use the library functions directly:

Tip: You can also import from subpaths (e.g., 'dictate-button/libs/injectDictateButton')
for smaller bundles, if your bundler resolves package subpath exports.

```js
import 'dictate-button' // Required when using library functions directly
import { injectDictateButton, injectDictateButtonOnLoad } from 'dictate-button/libs'

// Inject dictate buttons immediately to matching elements
injectDictateButton(
'textarea.custom-selector', // CSS selector for target elements
{
buttonSize: 30, // Button size in pixels (optional; default: 30)
verbose: false, // Log events to console (optional; default: false)
apiEndpoint: 'wss://api.example.com/transcribe' // Optional custom API endpoint
}
)

// Inject on DOM load with mutation observer to catch dynamically added elements
injectDictateButtonOnLoad(
'input.custom-selector', // CSS selector for target elements
{
buttonSize: 30, // Button size in pixels (optional; default: 30)
verbose: false, // Log events to console (optional; default: false)
apiEndpoint: 'wss://api.example.com/transcribe', // Optional custom API endpoint
watchDomChanges: true // Watch for DOM changes (optional; default: false)
}
)
```

Note: the injector mirrors the target field’s display/margins into the wrapper,
sets wrapper width to 100% for block-level fields, and adds padding to avoid the button overlapping text.
The wrapper also has the `dictate-button-wrapper` class for easy styling.

## Events

The dictate-button component emits the following events:

- `dictate-start`: Fired when transcription starts (after microphone access is granted and WebSocket connection is established).
- `dictate-text`: Fired during transcription when text is available. This includes both interim (partial) transcripts that may change and final transcripts. The event detail contains the current transcribed text.
- `dictate-end`: Fired when transcription ends. The event detail contains the final transcribed text.
- `dictate-error`: Fired when an error occurs (microphone access denied, WebSocket connection failure, server error, etc.). The event detail contains the error message.

The typical flow is:

> dictate-start -> dictate-text (multiple times) -> dictate-end

In case of an error, the `dictate-error` event is fired.

Example event handling:

```javascript
const dictateButton = document.querySelector('dictate-button');

dictateButton.addEventListener('dictate-start', () => {
console.log('Transcription started');
});

dictateButton.addEventListener('dictate-text', (event) => {
const currentText = event.detail;
console.log('Current text:', currentText);
// Update UI with interim/partial transcription
});

dictateButton.addEventListener('dictate-end', (event) => {
const finalText = event.detail;
console.log('Final transcribed text:', finalText);

// Add the final text to your input field
document.querySelector('#my-input').value += finalText;
});

dictateButton.addEventListener('dictate-error', (event) => {
const error = event.detail;
console.error('Transcription error:', error);
});
```

## Attributes

| Attribute | Type | Default | Description |
|---------------|---------|--------------------------------------------|-----------------------------------------|
| size | number | 30 | Size of the button in pixels |
| apiEndpoint | string | wss://api.dictate-button.io/v2/transcribe | WebSockets API endpoint of transcription service |
| language | string | en | Optional [language](https://github.com/dictate-button/dictate-button/wiki/Supported-Languages-and-Dialects) code (e.g., 'fr', 'de') |
| theme | string | (inherits from page) | 'light' or 'dark' |
| class | string | | Custom CSS class |

## Styling

You can customize the appearance of the dictate button using CSS parts:

```css
/* Style the button container */
dictate-button::part(container) {
/* Custom styles */
}

/* Style the button itself */
dictate-button::part(button) {
/* Custom styles */
}

/* Style the button icons */
dictate-button::part(icon) {
/* Custom styles */
}
```

## API Endpoint

By default, dictate-button uses the `wss://api.dictate-button.io/v2/transcribe` endpoint for real-time speech-to-text streaming.
You can specify your own endpoint by setting the `apiEndpoint` attribute.

The API uses WebSocket for real-time transcription:
- **Protocol**: WebSocket (wss://)
- **Connection**: Opens WebSocket connection with optional language query parameter (e.g., `?language=en`)
- **Audio Format**: PCM16 audio data at 16kHz sample rate, sent as binary chunks
- **Messages Sent**:
- Binary audio data (Int16Array buffers) - Continuous stream of PCM16 audio chunks
- `{ type: 'close' }` - JSON message to signal end of audio stream and trigger finalization
- **Messages Received**: JSON messages with the following types:
- `{ type: 'session_opened', sessionId: string, expiresAt: number }` - Session started
- `{ type: 'interim_transcript', text: string }` - Interim (partial) transcription result that may change as more audio is processed
- `{ type: 'transcript', text: string, turn_order?: number }` - Final transcription result for the current turn
- `{ type: 'session_closed', code: number, reason: string }` - Session ended
- `{ type: 'error', error: string }` - Error occurred

## Browser Compatibility

The dictate-button component requires the following browser features:
- Web Components
- MediaStream API (getUserMedia)
- Web Audio API (AudioContext, AudioWorklet)
- WebSocket API

Works in all modern browsers (Chrome, Firefox, Safari, Edge).