https://github.com/moonshine-ai/moonshine
Fast and accurate automatic speech recognition (ASR) for edge devices
https://github.com/moonshine-ai/moonshine
Last synced: 3 months ago
JSON representation
Fast and accurate automatic speech recognition (ASR) for edge devices
- Host: GitHub
- URL: https://github.com/moonshine-ai/moonshine
- Owner: moonshine-ai
- Created: 2024-10-04T22:10:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-02-10T01:25:01.000Z (4 months ago)
- Last Synced: 2026-02-10T06:44:24.445Z (4 months ago)
- Language: C
- Homepage:
- Size: 274 MB
- Stars: 3,136
- Watchers: 39
- Forks: 160
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-privacy - Moonshine - Fast and accurate automatic speech recognition (ASR) for edge devices. (Artificial Intelligence / Android Launcher)
- fucking-awesome-privacy - Moonshine - Fast and accurate automatic speech recognition (ASR) for edge devices. (Artificial Intelligence / Android Launcher)
- AiTreasureBox - moonshine-ai/moonshine - 03-09_7202_43](https://img.shields.io/github/stars/moonshine-ai/moonshine.svg)|Fast and accurate automatic speech recognition (ASR) for edge devices| (Repos)
- awesome - moonshine-ai/moonshine - Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces (C)
- awesome - moonshine-ai/moonshine - Fast and accurate automatic speech recognition (ASR) for edge devices (<a name="Python"></a>Python)
- awesome-rainmana - moonshine-ai/moonshine - Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces (C)
- awesome-github-projects - moonshine - Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces ⭐8,326 `C` 🔥 (🌐 Web Development - Frontend)
README
Moonshine
[[Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/) [[Paper 1]](https://arxiv.org/abs/2410.15608) [[Paper 2]](https://arxiv.org/abs/2509.02523) [[Model Card]](https://github.com/moonshine-ai/moonshine/blob/main/model-card.md) [[Podcast]](https://notebooklm.google.com/notebook/d787d6c2-7d7b-478c-b7d5-a0be4c74ae19/audio) | [[Join our Discord for questions and support](https://discord.gg/27qp9zSRXF)]
Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. It is well-suited to real-time, on-device applications like live transcription and voice command recognition. English Moonshine obtains word-error rates (WER) better than similarly-sized Tiny and Base Whisper on the [OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard), and non-English Moonshine variants [outperform](#supported-languages) Whisper Small and Medium, which are 9x and 28x larger, respectively.
Moonshine processes audio segments between _5x-15x faster_ than Whisper while maintaining the same (or significantly better!) WER/CER. This is because its compute requirements scale with the length of input audio. Shorter input audio is processed faster, unlike Whisper models that process everything as 30-second chunks.
Unquantized Base is 62M parameters (or 400MB), while Tiny is 27M parameters (around 190MB).
## Supported Languages
Moonshine currently supports 8 languages. Below is a performance summary. Arabic, Chinese, Japanese, and Korean are character-error rates (CER); all others are WER.
| Language | Tag | Moonshine Tiny (27M) | Moonshine Base (62M) | Whisper Tiny (39M) | Whisper Base (74M) | Whisper Small (244M) | Whisper Medium (769M) |
| ---------- | ---- | --------- | ------- | ------- | ------- | ------- | ------- |
| Arabic | `ar` | 24.76 | | 52.40 | 48.25 | 32.44 | 25.44 |
| English | | 12.66 | 10.07 | 12.81 | 10.32 | | |
| Chinese | `zh` | 32.77 | | 68.51 | 59.13 | 46.76 | 40.41 |
| Japanese | `ja` | 15.69 | | 96.71 | 72.69 | 40.94 | 27.88 |
| Korean | `ko` | 9.85 | | 23.92 | 15.93 | 9.87 | 7.68 |
| Spanish | `es` | | TBA | | | | |
| Ukrainian | `uk` | 19.70 | | 66.77 | 48.56 | 25.93 | 16.51 |
| Vietnamese | `vi` | 15.92 | | 96.4 | 52.79 | 26.46 | 18.49 |

Read [the paper](https://arxiv.org/abs/2509.02523) for more details on our non-English flavors of Moonshine.
## Supported Backends
With the release of new Moonshine languages, we have deprecated the Keras-based `moonshine` package. We recommend using Hugging Face `transformers` for vibe-checking the models, and using the ONNX runtime via `moonshine-onnx` for on-device applications. This table summarizes support:
| Model | Language | `transformers` | ONNX | Keras (deprecated) |
| ------- | --- | ---- | --- | --- |
| `tiny-ar` | Arabic | ✅ | ✅ | ❌ |
| `tiny-zh` | Chinese | ✅ | ✅ | ❌ |
| `tiny` | English | ✅ | ✅ | ✅ |
| `base` | English | ✅ | ✅ | ✅ |
| `tiny-ja` | Japanese | ✅ | ✅ | ❌ |
| `tiny-ko` | Korean | ✅ | ✅ | ❌ |
| `base-es` | Spanish | ✅ | ✅ | ❌ |
| `tiny-uk` | Ukrainian | ✅ | ✅ | ❌ |
| `tiny-vi` | Vietnamese | ✅ | ✅ | ❌ |
## Table of Contents
- [Installation](#installation)
- [1. Create a virtual environment](#1-create-a-virtual-environment)
- [2. Install `useful-moonshine-onnx`](#2-install-useful-moonshine-onnx)
- [3. Try it out](#3-try-it-out)
- [Examples](#examples)
- [Hugging Face Transformers](#huggingface-transformers)
- [Live Captions](#live-captions)
- [CTranslate2](#ctranslate2)
- [Web Applications](#web-applications)
- [Discord](#discord)
- [License](#license)
- [Citation](#citation)
## Installation
We like `uv` for managing Python environments, so we use it here. If you don't want to use it, simply skip the `uv` installation and leave `uv` off of your shell commands.
### 1. Create a virtual environment
First, [install](https://github.com/astral-sh/uv) `uv` for Python environment management.
Then create and activate a virtual environment:
```shell
uv venv env_moonshine
source env_moonshine/bin/activate
```
### 2. Install `useful-moonshine-onnx`
Using Moonshine with the ONNX runtime is preferable if you want to run the models on SBCs like the Raspberry Pi. To use it, run the following:
```shell
uv pip install useful-moonshine-onnx@git+https://git@github.com/moonshine-ai/moonshine.git#subdirectory=moonshine-onnx
```
### 3. Try it out
You can test Moonshine by transcribing the provided example audio file with the `.transcribe` function:
```shell
python
>>> import moonshine_onnx
>>> moonshine_onnx.transcribe(moonshine_onnx.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny')
['Ever tried ever failed, no matter try again, fail again, fail better.']
```
The first argument is a path to an audio file and the second is the name of a Moonshine model. `moonshine/tiny` and `moonshine/base` are English-only models. If you wish to use one of the non-English Moonshine models, just append the language [IETF tag](https://en.wikipedia.org/wiki/IETF_language_tag) to the model name, e.g., `moonshine/tiny-ko`. See [the table](#supported-languages) for supported languages and their tags.
## Examples
Moonshine models can be used in many applications, so we've included code samples showing how to use them in different situations. The [`demo`](/demo/) folder in this repository also has more information on them.
### Hugging Face Transformers
Moonshine is supported by the `transformers` library, as follows:
```python
import torch
from transformers import AutoProcessor, MoonshineForConditionalGeneration
from datasets import load_dataset
processor = AutoProcessor.from_pretrained("UsefulSensors/moonshine-tiny")
model = MoonshineForConditionalGeneration.from_pretrained("UsefulSensors/moonshine-tiny")
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_array = ds[0]["audio"]["array"]
inputs = processor(audio_array, return_tensors="pt")
generated_ids = model.generate(**inputs)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(transcription)
```
If you wish to use one of the non-English Moonshine models, just append the [IETF code](https://en.wikipedia.org/wiki/IETF_language_tag) to the repo ID, e.g., `UsefulSensors/moonshine-tiny-ko`. See [the table](#supported-languages) for supported languages and their tags.
### Live Captions
You can try the Moonshine ONNX models with live input from a microphone with the [live captions demo](/demo/README.md#demo-live-captioning-from-microphone-input).
### CTranslate2
The files for the CTranslate2 versions of Moonshine are available at [huggingface.co/UsefulSensors/moonshine/tree/main/ctranslate2](https://huggingface.co/UsefulSensors/moonshine/tree/main/ctranslate2), but they require [a pull request to be merged](https://github.com/OpenNMT/CTranslate2/pull/1808) before they can be used with the mainline version of the framework. Until then, you should be able to try them with [our branch](https://github.com/njeffrie/CTranslate2/tree/master), with [this example script](https://github.com/OpenNMT/CTranslate2/pull/1808#issuecomment-2439725339).
### Web Applications
Use our [MoonshineJS](https://github.com/moonshine-ai/moonshine-js) library to run Moonshine models in the web browser with a few lines of Javascript.
### Discord
We have [an active Discord server](https://discord.gg/27qp9zSRXF) where we're happy to answer questions, offer support, and generally geek out about voice AI, so please come [join the conversations](https://discord.gg/27qp9zSRXF).
## License
All inference code in this repo is released under the MIT license. The English Moonshine models are also released under the MIT license.
All non-English Moonshine variants are released under the [Moonshine AI Community License](https://www.moonshine.ai/moonshine_community_license.txt) (TLDR: Models are free to use for researchers, developers, small businesses, and creators with less than $1M in annual revenue.).
A copy of both licenses is included in this repository.
## Citation
If you benefit from our work, please cite our paper:
```
@misc{jeffries2024moonshinespeechrecognitionlive,
title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
year={2024},
eprint={2410.15608},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2410.15608},
}
```
Please also cite our paper on non-English Moonshine variants if you find them useful:
```
@misc{king2025flavorsmoonshinetinyspecialized,
title={Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices},
author={Evan King and Adam Sabra and Manjunath Kudlur and James Wang and Pete Warden},
year={2025},
eprint={2509.02523},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.02523},
}
```