https://github.com/xaionaro-go/speech
A Speech-To-Text (with translation) library and tools; currently based on Whisper (runs locally if needed; no need in any API keys); initially is written to add auto-translation subtitles to OBS
https://github.com/xaionaro-go/speech
ai converter go golang library module obs package self-hosted speech speech-recognition speech-to-text stream text translate whisper
Last synced: 11 months ago
JSON representation
A Speech-To-Text (with translation) library and tools; currently based on Whisper (runs locally if needed; no need in any API keys); initially is written to add auto-translation subtitles to OBS
- Host: GitHub
- URL: https://github.com/xaionaro-go/speech
- Owner: xaionaro-go
- License: cc0-1.0
- Created: 2025-01-12T03:09:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-21T22:54:44.000Z (about 1 year ago)
- Last Synced: 2025-04-21T23:30:03.383Z (about 1 year ago)
- Topics: ai, converter, go, golang, library, module, obs, package, self-hosted, speech, speech-recognition, speech-to-text, stream, text, translate, whisper
- Language: Go
- Homepage:
- Size: 414 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `speech`
This is a library for Speech-To-Text operations in Go, and 2 tools to demonstrate how to use it:
* [`./cmd/stt`](./cmd/stt/main.go) -- just converts a PCM_F32_LE stream into transcription (by default translated to English).
* [`./cmd/subtitleswindow/main.go`](./cmd/subtitleswindow/main.go) -- is a window that dynamically displays subtitles to be used in OBS, given a link to the audio stream (e.g. your RTMP link).
Currently, we provide API for using Whisper directly and/or for using whisper via HTTP API.
# Quick start
Install dependencies
```sh
apt install -y libavcodec-dev libavformat-dev libavfilter-dev libavdevice-dev libswscale-dev libsrt-openssl-dev libssl-dev libasound2-dev libxxf86vm-dev make cmake nvidia-cuda-toolkit cuda-toolkit-12-2 libxcursor-dev libxrandr-dev libxinerama-dev libxi-dev
```
### `stt`
If you use Linux:
```sh
WHISPER_MODEL=medium ENABLE_CUDA=true make example-stt
```
(keep in mind: the larger model is the more time it takes to warm up)
It will start listening the microphone, and you can start speaking. It should print the translation of your speech to English.
For example in my case:
```
WHISPER_MODEL=medium ENABLE_CUDA=true make example-stt
[...a lot of log...]
23.3s - 26.3s: Hello.
28s - 30.6s: This is just a demonstration that the thing works properly.
30.7s - 37.9s: And somehow it does work properly, which is weird.
```
### `subtitleswindow`
Run:
```sh
ENABLE_CUDA=true make example-subtitleswindow
```
It should start a window that automatically translates your speech (it listens the microphone):

If you need to translate a speech given an RTMP/RTSP/SRT/whatever link, then:
Build:
```sh
ENABLE_CUDA=true make subtitleswindow-linux-amd64
```
Run:
```sh
(cd ./thirdparty/whisper.cpp && ./models/download-ggml-model.sh medium)
./build/subtitleswindow-linux-amd64 thirdparty/whisper.cpp/models/ggml-medium.bin rtmp://my.server:1935/myapp/mystream/
```
A window will pop up, and you'll see that it displays the most recent transcriptions. You can add this window in OBS to have live translation of your speech on your stream screen:
### `subtitleswindow` with computing on a remote server
On the remote server run:
```sh
ENABLE_CUDA=true make sttd-linux-amd64
(cd ./thirdparty/whisper.cpp && ./models/download-ggml-model.sh large-v3)
./build/sttd-linux-amd64 --log-level trace 0.0.0.0:1234 --default-model-file thirdparty/whisper.cpp/models/ggml-large-v3.bin
```
Now on your local computer run:
```sh
make subtitleswindow-linux-amd64
./build/subtitleswindow-linux-amd64 --remote-addr address-of-my-remote-server:1234 --translate=true ''
```