Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/appleboy/go-whisper
Speech o Text using docker image with ggerganov/whisper.cpp
https://github.com/appleboy/go-whisper
golang openai whisper whisper-ai whisper-cpp
Last synced: 2 months ago
JSON representation
Speech o Text using docker image with ggerganov/whisper.cpp
- Host: GitHub
- URL: https://github.com/appleboy/go-whisper
- Owner: appleboy
- License: mit
- Created: 2023-06-10T08:34:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-08T12:42:18.000Z (9 months ago)
- Last Synced: 2024-04-14T09:48:28.329Z (8 months ago)
- Topics: golang, openai, whisper, whisper-ai, whisper-cpp
- Language: Go
- Homepage:
- Size: 444 KB
- Stars: 33
- Watchers: 2
- Forks: 4
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# go-whisper
Docker Image for Speech-to-Text using [ggerganov/whisper.cpp][1].
This Docker image provides a ready-to-use environment for converting speech to text using the [ggerganov/whisper.cpp][1] library. The whisper.cpp library is an open-source project that enables efficient and accurate speech recognition. By utilizing this Docker image, users can easily set up and run the speech-to-text conversion process without worrying about installing dependencies or configuring the system.
The Docker image includes all necessary components and dependencies, ensuring a seamless experience for users who want to leverage the power of the whisper.cpp library for their speech recognition needs. Simply pull the Docker image, run the container, and start converting your audio files into text with minimal effort.
In summary, this Docker image offers a convenient and efficient way to utilize the [ggerganov/whisper.cpp][1] library for speech-to-text conversion, making it an excellent choice for those looking to implement speech recognition in their projects.
[1]:https://github.com/ggerganov/whisper.cpp
## OpenAI's Whisper models converted to ggml format
See the [Available models][2].
| Model | Disk | Mem | SHA |
|------------|---------|-----------|----------------------------------------------|
| tiny | 75 MB | ~390 MB | bd577a113a864445d4c299885e0cb97d4ba92b5f |
| tiny.en | 75 MB | ~390 MB | c78c86eb1a8faa21b369bcd33207cc90d64ae9df |
| base | 142 MB | ~500 MB | 465707469ff3a37a2b9b8d8f89f2f99de7299dac |
| base.en | 142 MB | ~500 MB | 137c40403d78fd54d454da0f9bd998f78703390c |
| small | 466 MB | ~1.0 GB | 55356645c2b361a969dfd0ef2c5a50d530afd8d5 |
| small.en | 466 MB | ~1.0 GB | db8a495a91d927739e50b3fc1cc4c6b8f6c2d022 |
| medium | 1.5 GB | ~2.6 GB | fd9727b6e1217c2f614f9b698455c4ffd82463b4 |
| medium.en | 1.5 GB | ~2.6 GB | 8c30f0e44ce9560643ebd10bbe50cd20eafd3723 |
| large-v1 | 2.9 GB | ~4.7 GB | b1caaf735c4cc1429223d5a74f0f4d0b9b59a299 |
| large | 2.9 GB | ~4.7 GB | 0f4c8e34f21cf1a914c59d8b3ce882345ad349d6 |For more information see [ggerganov/whisper.cpp][3].
[2]: https://huggingface.co/ggerganov/whisper.cpp/tree/main
[3]: https://github.com/ggerganov/whisper.cpp/tree/master/models## Prepare
Download the model you want to use and put it in the `models` directory.
```sh
curl -LJ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \
--output models/ggml-small.bin
```## Usage
Please follow these simplified instructions to transcribe the audio file using a Docker container:
1. Ensure that you have a `testdata` directory containing the `jfk.wav` file.
2. Mount both the `models` and `testdata` directories to the Docker container.
3. Specify the model using the `--model` flag and the audio file path using the `--audio-path` flag.
4. The transcript result file will be saved in the same directory as the audio file.To transcribe the audio file, execute the command provided below.
```sh
docker run \
-v $PWD/models:/app/models \
-v $PWD/testdata:/app/testdata \
ghcr.io/appleboy/go-whisper:latest \
--model /app/models/ggml-small.bin \
--audio-path /app/testdata/jfk.wav
```See the following output:
```sh
whisper_init_from_file_no_state: loading model from '/app/models/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 3
whisper_model_load: mem required = 743.00 MB (+ 16.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 464.68 MB
whisper_model_load: model size = 464.44 MB
whisper_init_state: kv self size = 15.75 MB
whisper_init_state: kv cross size = 52.73 MB
1:46AM INF system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | COREML = 0 |
module=transcript
whisper_full_with_state: auto-detected language: en (p = 0.967331)
1:46AM INF [ 0s -> 11s] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. module=transcript
```command line arguments:
| Options | Description | Default Value |
|-----------------------|------------------------------------------------------------|-------------------|
| --model | Model is the interface to a whisper model | [$PLUGIN_MODEL, $INPUT_MODEL] |
| --audio-path | audio path | [$PLUGIN_AUDIO_PATH, $INPUT_AUDIO_PATH] |
| --output-folder | output folder | [$PLUGIN_OUTPUT_FOLDER, $INPUT_OUTPUT_FOLDER] |
| --output-format | output format, support txt, srt, csv | (default: "txt") [$PLUGIN_OUTPUT_FORMAT, $INPUT_OUTPUT_FORMAT] |
| --output-filename | output filename | [$PLUGIN_OUTPUT_FILENAME, $INPUT_OUTPUT_FILENAME] |
| --language | Set the language to use for speech recognition | (default: "auto") [$PLUGIN_LANGUAGE, $INPUT_LANGUAGE] |
| --threads | Set number of threads to use | (default: 8) [$PLUGIN_THREADS, $INPUT_THREADS] |
| --debug | enable debug mode | (default: false) [$PLUGIN_DEBUG, $INPUT_DEBUG] |
| --speedup | speed up audio by x2 (reduced accuracy) | (default: false) [$PLUGIN_SPEEDUP, $INPUT_SPEEDUP] |
| --translate | translate from source language to english | (default: false) [$PLUGIN_TRANSLATE, $INPUT_TRANSLATE] |
| --print-progress | print progress | (default: true) [$PLUGIN_PRINT_PROGRESS, $INPUT_PRINT_PROGRESS] |
| --print-segment | print segment | (default: false) [$PLUGIN_PRINT_SEGMENT, $INPUT_PRINT_SEGMENT] |
| --webhook-url | webhook url | [$PLUGIN_WEBHOOK_URL, $INPUT_WEBHOOK_URL] |
| --webhook-insecure | webhook insecure | (default: false) [$PLUGIN_WEBHOOK_INSECURE, $INPUT_WEBHOOK_INSECURE] |
| --webhook-headers | webhook headers | [$PLUGIN_WEBHOOK_HEADERS, $INPUT_WEBHOOK_HEADERS] |
| --youtube-url | youtube url | [$PLUGIN_YOUTUBE_URL, $INPUT_YOUTUBE_URL] |
| --youtube-insecure | youtube insecure | (default: false) [$PLUGIN_YOUTUBE_INSECURE, $INPUT_YOUTUBE_INSECURE] |
| --youtube-retry-count | youtube retry count | (default: 20) [$PLUGIN_YOUTUBE_RETRY_COUNT, $INPUT_YOUTUBE_RETRY_COUNT] |
| --prompt | initial prompt | [$PLUGIN_PROMPT, $INPUT_PROMPT] |
| --help, -h | show help | |
| --version, -v | print the version | |