https://github.com/winstxnhdw/capgen
A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.
https://github.com/winstxnhdw/capgen
asr automatic-speech-recognition ctranslate2 docker granian huggingface huggingface-spaces litestar whisper
Last synced: about 2 months ago
JSON representation
A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.
- Host: GitHub
- URL: https://github.com/winstxnhdw/capgen
- Owner: winstxnhdw
- Created: 2023-09-16T18:44:19.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-28T15:50:00.000Z (6 months ago)
- Last Synced: 2024-10-30T09:03:59.893Z (6 months ago)
- Topics: asr, automatic-speech-recognition, ctranslate2, docker, granian, huggingface, huggingface-spaces, litestar, whisper
- Language: Python
- Homepage: https://huggingface.co/spaces/winstxnhdw/CapGen
- Size: 1.14 MB
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CapGen
[](https://github.com/PyCQA/pylint)
[](https://github.com/winstxnhdw/CapGen/actions/workflows/main.yml)
[](https://github.com/winstxnhdw/CapGen/actions/workflows/deploy.yml)
[](https://github.com/winstxnhdw/CapGen/actions/workflows/formatter.yml)
[](https://github.com/winstxnhdw/CapGen/actions/workflows/warmer.yml)
[](https://github.com/winstxnhdw/CapGen/actions/workflows/dependabot.yml)[](https://huggingface.co/spaces/winstxnhdw/CapGen)
[](https://github.com/winstxnhdw/CapGen/compare)A fast cross-platform CPU-first video/audio English-only transcriber for generating caption files with [Whisper](https://openai.com/research/whisper) and [CTranslate2](https://github.com/OpenNMT/CTranslate2), hosted on Hugging Face Spaces. A `pip` installable offline CLI tool with CUDA support is provided. By default, Voice Activity Detection (VAD) preprocessing is always enabled.
## Requirements
- Python 3.11
- 4 GB RAM## Usage (API)
Simply cURL the endpoint like in the following. Currently, the only available caption format are `srt`, `vtt` and `txt`.
```bash
curl "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe?caption_format=$CAPTION_FORMAT" \
-F "file=@$AUDIO_FILE_PATH"
```You can also redirect the output to a file.
```bash
curl "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe?caption_format=$CAPTION_FORMAT" \
-F "file=@$AUDIO_FILE_PATH" | jq -r ".result" > result.srt
```You can stream the captions in real-time with the following.
```bash
curl -N "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe/stream?caption_format=$CAPTION_FORMAT" \
-F "file=@$AUDIO_FILE_PATH"
```## Usage (CLI)
`CapGen` is available as a CLI tool with CUDA support. You can install it with `pip`.
```bash
pip install git+https://github.com/winstxnhdw/CapGen
```You may also install `CapGen` with the necessary CUDA binaries.
```bash
pip install "capgen[cuda] @ git+https://github.com/winstxnhdw/CapGen"
```Now, you can run the CLI tool with the following command.
```bash
capgen -c srt -o ./result.srt --cuda < ~/Downloads/audio.mp3
``````yaml
usage: capgen [-h] [-g] [-t] [-w] -c -o [file]transcribe a compatible audio/video file into a chosen caption file format
positional arguments:
file the file path to a compatible audio/videooptions:
-h, --help show this help message and exit
-g, --cuda whether to use CUDA for inferencecpu:
-t, --threads the number of CPU threads
-w, --workers the number of CPU workersrequired:
-c, --caption the chosen caption file format
-o, --output the output file path
```## Development
You can install the required dependencies for your editor with the following.
```bash
poetry install
```You can spin the server up locally with the following. You can access the Swagger UI at [localhost:7860/api/docs](http://localhost:7860/api/docs).
```bash
docker build -f Dockerfile.build -t capgen .
docker run --rm -e SERVER_PORT=7860 -p 7860:7860 capgen
```