https://github.com/maxbbraun/whisper-edge
OpenAI Whisper for edge devices
https://github.com/maxbbraun/whisper-edge
Last synced: about 1 year ago
JSON representation
OpenAI Whisper for edge devices
- Host: GitHub
- URL: https://github.com/maxbbraun/whisper-edge
- Owner: maxbbraun
- License: mit
- Created: 2023-03-15T12:16:47.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-21T08:27:11.000Z (about 3 years ago)
- Last Synced: 2024-11-06T02:43:55.684Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 3.53 MB
- Stars: 115
- Watchers: 9
- Forks: 19
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Whisper Edge
Porting [OpenAI Whisper](https://github.com/openai/whisper) speech recognition to edge devices with hardware ML accelerators, enabling always-on live voice transcription. Current work includes [Jetson Nano](#jetson-nano) and [Coral Edge TPU](#coral-edge-tpu).
## Jetson Nano

### Shopping cart
| Part | Price (2023) |
| :- | -: |
| [NVIDIA Jetson Nano Developer Kit (4G)](https://developer.nvidia.com/embedded/jetson-nano-developer-kit) | [$149.00](https://www.amazon.com/NVIDIA-Jetson-Nano-Developer-945-13450-0000-100/dp/B084DSDDLT/) |
| [ChanGeek CGS-M1 USB Microphone](https://www.amazon.com/gp/product/B08M37224H/ref=ppx_yo_dt_b_asin_title_o03_s00) | [$16.99](https://www.amazon.com/gp/product/B08M37224H/ref=ppx_yo_dt_b_asin_title_o03_s00) |
| [Noctua NF-A4x10 5V Fan](https://noctua.at/en/products/fan/nf-a4x10-5v) (or similar, recommended) | [$13.95](https://www.amazon.com/Noctua-Cooling-Bearing-NF-A4X10-FLX-5V/dp/B00NEMGCIA/) |
| [D-Link DWA-181 Wi-Fi Adapter](https://www.dlink.com/en/products/dwa-181-ac1300-mu-mimo-wi-fi-nano-usb-adapter) (or similar, optional) | [$21.94](https://www.amazon.com/D-Link-Wireless-Internet-Supported-DWA-181-US/dp/B07YYL3RYJ/) |
### Model
The [`base.en` version](https://github.com/openai/whisper#available-models-and-languages) of Whisper seems to work best for the Jetson Nano:
- `base` is the largest model size that fits into the 4GB of memory without modification.
- Inference performance with `base` is ~10x real-time in isolation and ~1x real-time while recording concurrently.
- Using the english-only `.en` version further improves WER ([<5% on LibriSpeech test-clean](https://cdn.openai.com/papers/whisper.pdf)).
### Hack
Dilemma:
- Whisper and some of its dependencies require Python 3.8.
- The latest supported version of [JetPack](https://developer.nvidia.com/embedded/jetpack) for Jetson Nano is [4.6.3](https://developer.nvidia.com/jetpack-sdk-463), which is on Python 3.6.
- [No easy way](https://github.com/maxbbraun/whisper-edge/issues/2) to update Python to 3.8 without losing CUDA support for PyTorch.
Workaround:
- Fork [whisper](https://github.com/maxbbraun/whisper) and [tiktoken](https://github.com/maxbbraun/tiktoken), downgrading them to Python 3.6.
### Setup
First, follow the [developer kit setup instructions](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit), connect the Wi-Fi adapter and the microphone to USB, and ideally [install a fan](https://noctua.at/en/nf-a4x10-flx/service). (Also plugging in an Ethernet cable helps to make the downloads faster.) Then, get a shell on the Jetson Nano:
```bash
ssh user@jetson-nano.local
```
We will use [NVIDIA Docker containers](https://hub.docker.com/r/dustynv/jetson-inference/tags) to run inference. Get the source code and build the custom container:
```bash
git clone https://github.com/maxbbraun/whisper-edge.git
bash whisper-edge/build.sh
```
### Run
Launch inference:
```bash
bash whisper-edge/run.sh
```
You should see console output similar to this:
```bash
I0317 00:42:23.979984 547488051216 stream.py:75] Loading model "base.en"...
100%|#######################################| 139M/139M [00:30<00:00, 4.71MiB/s]
I0317 00:43:14.232425 547488051216 stream.py:79] Warming model up...
I0317 00:43:55.164070 547488051216 stream.py:86] Starting stream...
I0317 00:44:19.775566 547488051216 stream.py:51]
I0317 00:44:22.046195 547488051216 stream.py:51] Open AI's mission is to ensure that artificial general intelligence
I0317 00:44:31.353919 547488051216 stream.py:51] benefits all of humanity.
I0317 00:44:49.219501 547488051216 stream.py:51]
```
The [`stream.py` script](stream.py) run in the container accepts flags for different configurations:
```bash
bash whisper-edge/run.sh --help
USAGE: stream.py [flags]
flags:
stream.py:
--channel_index: The index of the channel to use for transcription.
(default: '0')
(an integer)
--chunk_seconds: The length in seconds of each recorded chunk of audio.
(default: '10')
(an integer)
--input_device: The input device used to record audio.
(default: 'plughw:2,0')
--language: The language to use or empty to auto-detect.
(default: 'en')
--latency: The latency of the recording stream.
(default: 'low')
--model_name: The version of the OpenAI Whisper model to use.
(default: 'base.en')
--num_channels: The number of channels of the recorded audio.
(default: '1')
(an integer)
--sample_rate: The sample rate of the recorded audio.
(default: '16000')
(an integer)
Try --helpfull to get a list of all flags.
```
### Troubleshooting
To see if the microphone is working properly, use [`alsa-utils`](https://github.com/alsa-project/alsa-utils):
```bash
sudo apt-get -y install alsa-utils
# Is the USB device connected?
lsusb
# Is the correct recording device selected?
arecord -l
# Is the gain set properly?
alsamixer
# Does a test recording work?
arecord --format=S16_LE --duration=5 --rate=16000 --channels=1 --device=plughw:2,0 test.wav
```
## Coral Edge TPU

See the corresponding [issue](https://github.com/maxbbraun/whisper-edge/issues/1) about what supporting the [Google Coral Edge TPU](https://coral.ai/products/) may look like.