https://github.com/shirayu/whispering

Streaming transcriber with whisper
https://github.com/shirayu/whispering

automatic-speech-recognition whisper

Last synced: 5 months ago
JSON representation

Streaming transcriber with whisper

Host: GitHub
URL: https://github.com/shirayu/whispering
Owner: shirayu
License: mit
Archived: true
Created: 2022-09-23T10:20:43.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2023-05-01T08:44:25.000Z (about 2 years ago)
Last Synced: 2024-09-27T06:22:30.395Z (9 months ago)
Topics: automatic-speech-recognition, whisper
Language: Python
Homepage:
Size: 288 KB
Stars: 684
Watchers: 19
Forks: 53
Open Issues: 6
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

awesome-openai-whisper - whispering - Streaming transcriber with whisper
StarryDivineSky - shirayu/whispering

README

# Whispering

[![MIT License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
![Python Versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)

[![CI](https://github.com/shirayu/whispering/actions/workflows/ci.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/ci.yml)
[![CodeQL](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml)
[![Typos](https://github.com/shirayu/whispering/actions/workflows/typos.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/typos.yml)

Streaming transcriber with [whisper](https://github.com/openai/whisper).
Enough machine power is needed to transcribe in real time.

## Notice

This repository has been archived.
There are some alternatives.

-
-

## Setup

```bash
pip install -U git+https://github.com/shirayu/[email protected]

# If you use GPU, install proper torch and torchaudio
# Check https://pytorch.org/get-started/locally/
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
```

If you get ``OSError: PortAudio library not found`` in Linux, install "PortAudio".

```bash
sudo apt -y install portaudio19-dev
```

## Example of microphone

```bash
# Run in English
# By the default, it needs to wait at least 30 seconds
whispering --language en --model tiny
```

- ``--help`` shows full options
- ``--model`` sets the [model name](https://github.com/openai/whisper#available-models-and-languages) to use. Larger models will be more accurate, but may not be able to transcribe in real time.
- ``--language`` sets the language to transcribe. The list of languages are shown with ``whispering -h``
- ``--no-progress`` disables the progress message
- ``-t`` sets temperatures to decode. You can set several like ``-t 0.0 -t 0.1 -t 0.5``, but too many temperatures exhaust decoding time
- ``--debug`` outputs logs for debug
- ``--vad`` sets VAD (Voice Activity Detection) threshold. The default is ``0.5``. ``0`` disables VAD and forces whisper to analyze non-voice activity sound period. Try ``--vad 0`` if VAD prevents transcription.
- ``--output`` sets output file (Default: Standard output)
- ``--frame``: the number of minimum frames of mel spectrogram input for Whisper (default: ``3000``. i.e. 30 seconds)

### Parse interval

By default, whispering performs VAD for every 3.75 second.
This interval is determined by the value of ``-n`` and its default is ``20``.
When an interval is predicted as "silence", it will not be passed to whisper.
If you want to disable VAD, please make VAD threshold 0 by adding ``--vad 0``.

By default, whispering does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds.
This is because the original Whisper assumes that the inputs are 30 seconds segments.
However, if silence segments appear 16 times (the default value of ``--max_nospeech_skip``) after speech is detected, the analysis is performed.
You can make the length of segments smaller with ``--frame`` option (default: 3000), but it sacrifices accuracy because this is not expected input for Whisper.

## Example of web socket

⚠ **No security mechanism. Please make secure with your responsibility.**

Run with ``--host`` and ``--port``.

### Host

```bash
whispering --language en --model tiny --host 0.0.0.0 --port 8000
```

### Client

```bash
whispering --host ADDRESS_OF_HOST --port 8000 --mode client
```

You can set ``-n`` and other options.

## For Developers

1. Install [Python](https://www.python.org/) and [Node.js](https://nodejs.org/)
2. [Install poetry](https://python-poetry.org/docs/) to use ``poetry`` command
3. Clone and install libraries

```console
# Clone
git clone https://github.com/shirayu/whispering.git

# With poetry
poetry config virtualenvs.in-project true
poetry install --all-extras
poetry run pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

# With npm
npm install
```

4. Run test and check that no errors occur

```bash
poetry run make -j4
```

5. Make fancy updates
6. Make style

```bash
poetry run make style
```

7. Run test again and check that no errors occur

```bash
poetry run make -j4
```

8. Check typos by using [typos](https://github.com/crate-ci/typos). Just run ``typos`` command in the root directory.

```bash
typos
```

9. Send Pull requests!

## License

- [MIT License](LICENSE)
- Some codes are ported from the original whisper. Its license is also [MIT License](LICENSE.whisper)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shirayu/whispering

Awesome Lists containing this project

README