Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abdeladim-s/pywhispercpp
Python bindings for whisper.cpp
https://github.com/abdeladim-s/pywhispercpp
openai-whisper whisper-cpp
Last synced: 5 days ago
JSON representation
Python bindings for whisper.cpp
- Host: GitHub
- URL: https://github.com/abdeladim-s/pywhispercpp
- Owner: abdeladim-s
- License: mit
- Created: 2023-03-10T03:04:01.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-24T03:35:01.000Z (about 2 months ago)
- Last Synced: 2024-10-24T20:00:59.985Z (about 2 months ago)
- Topics: openai-whisper, whisper-cpp
- Language: C++
- Homepage: https://abdeladim-s.github.io/pywhispercpp/
- Size: 1.34 MB
- Stars: 165
- Watchers: 6
- Forks: 24
- Open Issues: 28
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-openai-whisper - pywhispercpp - Python bindings for whisper.cpp
README
# pywhispercpp
Python bindings for [whisper.cpp](https://github.com/ggerganov/whisper.cpp) with a simple Pythonic API on top of it.[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Wheels](https://github.com/abdeladim-s/pywhispercpp/actions/workflows/wheels.yml/badge.svg?branch=main&event=push)](https://github.com/abdeladim-s/pywhispercpp/actions/workflows/wheels.yml)
[![PyPi version](https://badgen.net/pypi/v/pywhispercpp)](https://pypi.org/project/pywhispercpp/)
[![Downloads](https://static.pepy.tech/badge/pywhispercpp)](https://pepy.tech/project/pywhispercpp)whisper.cpp is:
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
- Plain C/C++ implementation without dependencies
- Apple silicon first-class citizen - optimized via Arm Neon and Accelerate framework
- AVX intrinsics support for x86 architectures
- VSX intrinsics support for POWER architectures
- Mixed F16 / F32 precision
- Low memory usage (Flash Attention)
- Zero memory allocations at runtime
- Runs on the CPU
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)Supported platforms:
- [x] Mac OS (Intel and Arm)
- [x] [iOS](examples/whisper.objc)
- [x] [Android](examples/whisper.android)
- [x] Linux / [FreeBSD](https://github.com/ggerganov/whisper.cpp/issues/56#issuecomment-1350920264)
- [x] [WebAssembly](examples/whisper.wasm)
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]# Table of contents
* [Installation](#installation)
* [PYPI](#pypi-)
* [From source](#from-source)
* [CoreML support](#coreml-support)
* [Quick start](#quick-start)
* [Examples](#examples)
* [Main](#main)
* [Assistant](#assistant)
* [Recording](#recording-)
* [Live Stream Transcription](#live-stream-transcription)
* [Advanced usage](#advanced-usage)
* [Discussions and contributions](#discussions-and-contributions)
* [License](#license)# Installation
First Install [ffmpeg](https://ffmpeg.org/)
```bash
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg# on Arch Linux
sudo pacman -S ffmpeg# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
```### PYPI
2. Once ffmpeg is installed, install `pywhispercpp`
```shell
pip install pywhispercpp
```If you want to use the examples, you will need to install extra dependencies
```shell
pip install pywhispercpp[examples]
```### From source
You can install the latest dev version from GitHub:```shell
git clone --recursive https://github.com/abdeladim-s/pywhispercpp
cd pywhispercpp
pip install .
```### NVIDIA GPU support
1. Make sure you have `cuda` installed.
2. Install the package with CUDA support:```shell
git clone --recursive https://github.com/abdeladim-s/pywhispercpp
cd pywhispercpp
WHISPER_CUDA=1 pip install .
```
### CoreML supportThanks to [@tangm](https://github.com/tangm), using CoreML is now supported:
Clone the repository and install the package with `WHISPER_COREML=1`:
```shell
git clone --recursive https://github.com/abdeladim-s/pywhispercpp
cd pywhispercpp
WHISPER_COREML=1 pip install .
```### Vulkan support
Thanks to [@thewh1teagle](https://github.com/thewh1teagle)
Clone the repository and install the package with `GGML_VULKAN=1`:
```shell
git clone --recursive https://github.com/abdeladim-s/pywhispercpp
cd pywhispercpp
GGML_VULKAN=1 pip install .
```# Quick start
```python
from pywhispercpp.model import Modelmodel = Model('base.en', n_threads=6)
segments = model.transcribe('file.mp3')
for segment in segments:
print(segment.text)
```You can also assign a custom `new_segment_callback`
```python
from pywhispercpp.model import Modelmodel = Model('base.en', print_realtime=False, print_progress=False)
segments = model.transcribe('file.mp3', new_segment_callback=print)
```* The `ggml` model will be downloaded automatically.
* You can pass any `whisper.cpp` [parameter](https://abdeladim-s.github.io/pywhispercpp/#pywhispercpp.constants.PARAMS_SCHEMA) as a keyword argument to the `Model` class or to the `transcribe` function.
* The `transcribe` function accepts any media file (audio/video), in any format.
* Check the [Model](https://abdeladim-s.github.io/pywhispercpp/#pywhispercpp.model.Model) class documentation for more details.# Examples
The [examples folder](https://github.com/abdeladim-s/pywhispercpp/tree/main/pywhispercpp/examples) contains several examples inspired from the original [whisper.cpp/examples](https://github.com/ggerganov/whisper.cpp/tree/master/examples).
## Main
Just a straightforward example with a simple Command Line Interface.Check the source code [here](https://github.com/abdeladim-s/pywhispercpp/blob/main/pywhispercpp/examples/main.py), or use the CLI as follows:
```shell
pwcpp file.wav -m base --output-srt --print_realtime true
```
Run ```pwcpp --help``` to get the help message```shell
usage: pwcpp [-h] [-m MODEL] [--version] [--processors PROCESSORS] [-otxt] [-ovtt] [-osrt] [-ocsv] [--strategy STRATEGY]
[--n_threads N_THREADS] [--n_max_text_ctx N_MAX_TEXT_CTX] [--offset_ms OFFSET_MS] [--duration_ms DURATION_MS]
[--translate TRANSLATE] [--no_context NO_CONTEXT] [--single_segment SINGLE_SEGMENT] [--print_special PRINT_SPECIAL]
[--print_progress PRINT_PROGRESS] [--print_realtime PRINT_REALTIME] [--print_timestamps PRINT_TIMESTAMPS]
[--token_timestamps TOKEN_TIMESTAMPS] [--thold_pt THOLD_PT] [--thold_ptsum THOLD_PTSUM] [--max_len MAX_LEN]
[--split_on_word SPLIT_ON_WORD] [--max_tokens MAX_TOKENS] [--audio_ctx AUDIO_CTX]
[--prompt_tokens PROMPT_TOKENS] [--prompt_n_tokens PROMPT_N_TOKENS] [--language LANGUAGE] [--suppress_blank SUPPRESS_BLANK]
[--suppress_non_speech_tokens SUPPRESS_NON_SPEECH_TOKENS] [--temperature TEMPERATURE] [--max_initial_ts MAX_INITIAL_TS]
[--length_penalty LENGTH_PENALTY] [--temperature_inc TEMPERATURE_INC] [--entropy_thold ENTROPY_THOLD]
[--logprob_thold LOGPROB_THOLD] [--no_speech_thold NO_SPEECH_THOLD] [--greedy GREEDY] [--beam_search BEAM_SEARCH]
media_file [media_file ...]positional arguments:
media_file The path of the media file or a list of filesseparated by spaceoptions:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Path to the `ggml` model, or just the model name
--version show program's version number and exit
--processors PROCESSORS
number of processors to use during computation
-otxt, --output-txt output result in a text file
-ovtt, --output-vtt output result in a vtt file
-osrt, --output-srt output result in a srt file
-ocsv, --output-csv output result in a CSV file
--strategy STRATEGY Available sampling strategiesGreefyDecoder -> 0BeamSearchDecoder -> 1
--n_threads N_THREADS
Number of threads to allocate for the inferencedefault to min(4, available hardware_concurrency)
--n_max_text_ctx N_MAX_TEXT_CTX
max tokens to use from past text as prompt for the decoder
--offset_ms OFFSET_MS
start offset in ms
--duration_ms DURATION_MS
audio duration to process in ms
--translate TRANSLATE
whether to translate the audio to English
--no_context NO_CONTEXT
do not use past transcription (if any) as initial prompt for the decoder
--single_segment SINGLE_SEGMENT
force single segment output (useful for streaming)
--print_special PRINT_SPECIAL
print special tokens (e.g. , , , etc.)
--print_progress PRINT_PROGRESS
print progress information
--print_realtime PRINT_REALTIME
print results from within whisper.cpp (avoid it, use callback instead)
--print_timestamps PRINT_TIMESTAMPS
print timestamps for each text segment when printing realtime
--token_timestamps TOKEN_TIMESTAMPS
enable token-level timestamps
--thold_pt THOLD_PT timestamp token probability threshold (~0.01)
--thold_ptsum THOLD_PTSUM
timestamp token sum probability threshold (~0.01)
--max_len MAX_LEN max segment length in characters
--split_on_word SPLIT_ON_WORD
split on word rather than on token (when used with max_len)
--max_tokens MAX_TOKENS
max tokens per segment (0 = no limit)
--audio_ctx AUDIO_CTX
overwrite the audio context size (0 = use default)
--prompt_tokens PROMPT_TOKENS
tokens to provide to the whisper decoder as initial prompt
--prompt_n_tokens PROMPT_N_TOKENS
tokens to provide to the whisper decoder as initial prompt
--language LANGUAGE for auto-detection, set to None, "" or "auto"
--suppress_blank SUPPRESS_BLANK
common decoding parameters
--suppress_non_speech_tokens SUPPRESS_NON_SPEECH_TOKENS
common decoding parameters
--temperature TEMPERATURE
initial decoding temperature
--max_initial_ts MAX_INITIAL_TS
max_initial_ts
--length_penalty LENGTH_PENALTY
length_penalty
--temperature_inc TEMPERATURE_INC
temperature_inc
--entropy_thold ENTROPY_THOLD
similar to OpenAI's "compression_ratio_threshold"
--logprob_thold LOGPROB_THOLD
logprob_thold
--no_speech_thold NO_SPEECH_THOLD
no_speech_thold
--greedy GREEDY greedy
--beam_search BEAM_SEARCH
beam_search```
## Assistant
This is a simple example showcasing the use of `pywhispercpp` as an assistant.
The idea is to use a `VAD` to detect speech (in this example we used webrtcvad), and when some speech is detected,
we run the transcription.
It is inspired from the [whisper.cpp/examples/command](https://github.com/ggerganov/whisper.cpp/tree/master/examples/command) example.You can check the source code [here](https://github.com/abdeladim-s/pywhispercpp/blob/main/pywhispercpp/examples/assistant.py)
or you can use the class directly to create your own assistant:```python
from pywhispercpp.examples.assistant import Assistantmy_assistant = Assistant(commands_callback=print, n_threads=8)
my_assistant.start()
```
Here we set the `commands_callback` to a simple `print`, so the commands will just get printed on the screen.You can run this example from the command line as well
```shell
$ pwcpp-assistant --helpusage: pwcpp-assistant [-h] [-m MODEL] [-ind INPUT_DEVICE] [-st SILENCE_THRESHOLD] [-bd BLOCK_DURATION]
options:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Whisper.cpp model, default to tiny.en
-ind INPUT_DEVICE, --input_device INPUT_DEVICE
Id of The input device (aka microphone)
-st SILENCE_THRESHOLD, --silence_threshold SILENCE_THRESHOLD
he duration of silence after which the inference will be running, default to 16
-bd BLOCK_DURATION, --block_duration BLOCK_DURATION
minimum time audio updates in ms, default to 30
```## Recording
Another simple [example](https://github.com/abdeladim-s/pywhispercpp/blob/main/pywhispercpp/examples/recording.py) to transcribe your own recordings.You can use it from Python as follows:
```python
from pywhispercpp.examples.recording import Recordingmyrec = Recording(5)
myrec.start()
```Or from the command line:
```shell
$ pwcpp-recording --helpusage: pwcpp-recording [-h] [-m MODEL] duration
positional arguments:
duration duration in secondsoptions:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Whisper.cpp model, default to tiny.en
```
## Live Stream Transcription
This [example](https://github.com/abdeladim-s/pywhispercpp/blob/main/pywhispercpp/examples/livestream.py) is an attempt to transcribe a livestream in realtime, but the results are not quite satisfactory yet, the CPU jumps quickly to 100% and I cannot use huge models on my descent machine.
(Or maybe I am doing something wrong!) :sweat_smile:If you have a powerful machine, give it a try.
From python :
```python
from pywhispercpp.examples.livestream import LiveStreamurl = "" # Make sure it is a direct stream URL
ls = LiveStream(url=url, n_threads=4)
ls.start()
```From the command line:
```shell
$ pwcpp-livestream --helpusage: pwcpp-livestream [-h] [-nt N_THREADS] [-m MODEL] [-od OUTPUT_DEVICE] [-bls BLOCK_SIZE] [-bus BUFFER_SIZE] [-ss SAMPLE_SIZE] url
positional arguments:
url Stream URLoptions:
-h, --help show this help message and exit
-nt N_THREADS, --n_threads N_THREADS
number of threads, default to 3
-m MODEL, --model MODEL
Whisper.cpp model, default to tiny.en
-od OUTPUT_DEVICE, --output_device OUTPUT_DEVICE
the output device, aka the speaker, leave it None to take the default
-bls BLOCK_SIZE, --block_size BLOCK_SIZE
block size, default to 1024
-bus BUFFER_SIZE, --buffer_size BUFFER_SIZE
number of blocks used for buffering, default to 20
-ss SAMPLE_SIZE, --sample_size SAMPLE_SIZE
Sample size, default to 4
```# Advanced usage
* First check the [API documentation](https://abdeladim-s.github.io/pywhispercpp/) for more advanced usage.
* If you are a more experienced user, you can access the [C-Style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h) directly, almost all functions from `whisper.h`
are exposed with the binding module `_pywhispercpp`.```python
import _pywhispercpp as pwcppctx = pwcpp.whisper_init_from_file('path/to/ggml/model')
```# Discussions and contributions
If you find any bug, please open an [issue](https://github.com/abdeladim-s/pywhispercpp/issues).If you have any feedback, or you want to share how you are using this project, feel free to use the [Discussions](https://github.com/abdeladim-s/pywhispercpp/discussions) and open a new topic.
# License
This project is licensed under the same license as [whisper.cpp](https://github.com/ggerganov/whisper.cpp/blob/master/LICENSE) (MIT [License](./LICENSE)).