An open API service indexing awesome lists of open source software.

https://github.com/compulim/react-dictate-button

A button to start dictation using Web Speech API.
https://github.com/compulim/react-dictate-button

Last synced: about 1 month ago
JSON representation

A button to start dictation using Web Speech API.

Awesome Lists containing this project

README

        

# react-dictate-button

[![npm version](https://badge.fury.io/js/react-dictate-button.svg)](https://badge.fury.io/js/react-dictate-button) [![Build Status](https://travis-ci.org/compulim/react-dictate-button.svg?branch=master)](https://travis-ci.org/compulim/react-dictate-button)

A button to start speech recognition using [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API), with an easy to understand event lifecycle.

# Breaking changes

## [2.0.0] - 2021-05-15

- Requires [`react@>=16.8.0`](https://npmjs.com/package/react) and [`core-js@3`](https://npmjs.com/package/core-js`)
- Modifying props while recognition has started will no longer abort recognition immediately, props will be updated in next recognition
- `SpeechGrammarList` is only constructed when `grammar` props is present
- If `speechRecognition` prop is not present, capability detection is now done through `window.mediaDevices.getUserMedia`

# Demo

Try out this component at [github.io/compulim/react-dictate-button](https://github.io/compulim/react-dictate-button/).

# Background

Reasons why we need to build our own component, instead of using [existing packages](https://www.npmjs.com/search?q=react%20speech) on NPM:

- Most browsers required speech recognition (or WebRTC) to be triggered by a user event (button click)
- Bring your own engine for [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API)
- Enable speech recognition on unsupported browsers by bridging it with [cloud-based service](https://npmjs.com/package/web-speech-cognitive-services)
- Support grammar list thru [JSpeech Grammar Format](https://www.w3.org/TR/jsgf/)
- Ability to interrupt recognition
- Ability to [morph into other elements](#customization-thru-morphing)

# How to use

First, install our production version by `npm install react-dictate-button`. Or our development version by `npm install react-dictate-button@master`.

```jsx
import { DictateButton } from 'react-dictate-button';

export default () => (

Start/stop

);
```

## Props

| Name | Type | Default | Description |
| ------------------- | ------------------------ | ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `className` | `string` | `undefined` | Class name to apply to the button |
| `continuous` | `boolean` | `false` | `true` to set Web Speech API to use continuous mode and should continue to recognize until stop, otherwise, `false` |
| `disabled` | `boolean` | `false` | `true` to abort ongoing recognition and disable the button, otherwise, `false` |
| `extra` | `{ [key: string]: any }` | `{}` | Additional properties to set to [`SpeechRecognition`](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition) before `start`, useful when bringing your own [`SpeechRecognition`](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition) |
| `grammar` | `string` | `undefined` | Grammar list in [JSGF format](https://developer.mozilla.org/en-US/docs/Web/API/SpeechGrammarList/addFromString) |
| `lang` | `string` | `undefined` | Language to recognize, for example, `'en-US'` or [`navigator.language`](https://developer.mozilla.org/en-US/docs/Web/API/NavigatorLanguage/language) |
| `speechGrammarList` | `any` | `window.SpeechGrammarList` (or vendor-prefixed) | Bring your own [`SpeechGrammarList`](https://developer.mozilla.org/en-US/docs/Web/API/SpeechGrammarList) |
| `speechRecognition` | `any` | `window.SpeechRecognition` (or vendor-prefixed) | Bring your own [`SpeechRecognition`](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition) |

> Note: change of `extra`, `grammar`, `lang`, `speechGrammarList`, and `speechRecognition` will not take effect until next speech recognition is started.

## Events



Name
Signature
Description




onClick

(event: MouseEvent) => void

Emit when the user click on the button, preventDefault will stop recognition from starting


onDictate

({

result: {
confidence: number,
transcript: number
},
type: 'dictate'
}) => void


Emit when recognition is completed


onError
(event: SpeechRecognitionErrorEvent) => void

Emit when error has occurred or recognition is interrupted, see below


onProgress

({

abortable: boolean,
results: [{
confidence: number,
transcript: number
}],
type: 'progress'
}) => void


Emit for interim results, the array contains every segments of recognized text


onRawEvent
(event: SpeechRecognitionEvent) => void


Emit for handling raw events from
SpeechRecognition


## Hooks

> Although previous versions exported a React Context, it is recommended to use the hooks interface.

| Name | Signature | Description |
| --------------- | ----------- | --------------------------------------------------------------------------------------------------- |
| `useAbortable` | `[boolean]` | If ongoing speech recognition has `abort()` function and can be aborted, `true`, otherwise, `false` |
| `useReadyState` | `[number]` | Returns the current state of recognition, refer to [this section](#function-as-a-child) |
| `useSupported` | `[boolean]` | If speech recognition is supported, `true`, otherwise, `false` |

### Checks if speech recognition is supported

To determines whether speech recognition is supported in the browser:

- If `speechRecognition` prop is `undefined`
- If both [`window.navigator.mediaDevices`](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices) and [`window.navigator.mediaDevices.getUserMedia`](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia) are falsy, it is not supported
- Probably the browser is not on a secure HTTP connection
- If both `window.SpeechRecognition` and vendor-prefixed are falsy, it is not supported
- If recognition failed once with `not-allowed` error code, it is not supported
- Otherwise, it is supported

> Even the browser is on an insecure HTTP connection, `window.SpeechRecognition` (or vendor-prefixed) will continue to be truthy. Instead, `mediaDevices.getUserMedia` is used for capability detection.

### Event lifecycle

One of the design aspect is to make sure events are easy to understand and deterministic. First rule of thumb is to make sure `onProgress` will lead to either `onDictate` or `onError`. Here are some samples of event firing sequence (tested on Chrome 67):

- Happy path: speech is recognized
1. `onStart`
1. `onProgress({})` (just started, therefore, no `results`)
1. `onProgress({ results: [] })`
1. `onDictate({ result: ... })`
1. `onEnd`
- Happy path: speech is recognized with continuous mode
1. `onStart`
1. `onProgress({})` (just started, therefore, no `results`)
1. `onProgress({ results: [] })`
1. `onDictate({ result: ... })`
1. `onProgress({ results: [] })`
1. `onDictate({ result: ... })`
1. `onEnd`
- Heard some sound, but nothing can be recognized
1. `onStart`
1. `onProgress({})`
1. `onDictate({})` (nothing is recognized, therefore, no `result`)
1. `onEnd`
- Nothing is heard (audio device available but muted)
1. `onStart`
1. `onProgress({})`
1. `onError({ error: 'no-speech' })`
1. `onEnd`
- Recognition aborted
1. `onStart`
1. `onProgress({})`
1. `onProgress({ results: [] })`
1. While speech is getting recognized, set `props.disabled` to `false`, abort recognition
1. `onError({ error: 'aborted' })`
1. `onEnd`
- Not authorized to use speech or no audio device is availablabortable: truee
1. `onStart`
1. `onError({ error: 'not-allowed' })`
1. `onEnd`

## Function as a child

Instead of passing child elements, you can pass a function to render different content based on ready state. This is called [function as a child](https://reactjs.org/docs/render-props.html#using-props-other-than-render).

| Ready state | Description |
| ----------- | -------------------------------------------------------------------------- |
| `0` | Not started |
| `1` | Starting recognition engine, recognition is not ready until it turn to `2` |
| `2` | Recognizing |
| `3` | Stopping |

For example,

```jsx

{({ readyState }) =>
readyState === 0 ? 'Start' : readyState === 1 ? 'Starting...' : readyState === 2 ? 'Listening...' : 'Stopping...'
}

```

# Customization thru morphing

You can build your own component by copying our layout code, without messing around the [logic code behind the scene](packages/component/src/Composer.js). For details, please refer to [`DictateButton.js`](packages/component/src/DictateButton.js), [`DictateCheckbox.js`](packages/component/src/DictateCheckbox.js), and [`DictationTextBox.js`](packages/pages/src/DictationTextBox.js).

## Checkbox version

In addition to ``, we also ship `` out of the box. The checkbox version is better suited for toggle button scenario and web accessibility. You can use the following code for the checkbox version.

```jsx
import { DictateCheckbox } from 'react-dictate-button';

export default () => (

Start/stop

);
```

## Text box with dictate button

We also provide a "text box with dictate button" version. But instead of shipping a full-fledged control, we make it a minimally-styled control so you can start copying the code and customize it in your own project. The sample code can be found at [DictationTextBox.js](packages/pages/src/DictationTextBox.js).

# Design considerations

- Hide the complexity of Web Speech events because we only want to focus on recognition experience
- Complexity in lifecycle events: `onstart`, `onaudiostart`, `onsoundstart`, `onspeechstart`
- `onresult` may not fire in some cases, `onnomatch` is not fired in Chrome
- To reduce complexity, we want to make sure event firing are either:
- Happy path: `onProgress`, then either `onDictate` or `onError`
- Otherwise: `onError`
- "Web Speech" could means speech synthesis, which is out of scope for this package
- "Speech Recognition" could means we will expose Web Speech API as-is, which we want to hide details and make it straightforward for recognition scenario

# Roadmap

Please feel free to [file](https://github.com/compulim/react-dictate-button/issues) suggestions.

- While `readyState` is 1 or 3 (transitioning), the underlying speech engine cannot be started/stopped until the state transition is complete
- Need rework on the state management
- Instead of putting all logic inside [`Composer.js`](packages/component/src/Composer.js), how about
1. Write an adapter to convert `SpeechRecognition` into another object with simpler event model and `readyState`
2. Rewrite `Composer.js` to bridge the new `SimpleSpeechRecognition` model and React Context
3. Expose `SimpleSpeechRecognition` so people not on React can still benefit from the simpler event model

# Contributions

Like us? [Star](https://github.com/compulim/react-dictate-button/stargazers) us.

Want to make it better? [File](https://github.com/compulim/react-dictate-button/issues) us an issue.

Don't like something you see? [Submit](https://github.com/compulim/react-dictate-button/pulls) a pull request.