Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cansik/speech-to-text-osc
Speech to text with OSC output.
https://github.com/cansik/speech-to-text-osc
osc speech-to-text whisper
Last synced: 22 days ago
JSON representation
Speech to text with OSC output.
- Host: GitHub
- URL: https://github.com/cansik/speech-to-text-osc
- Owner: cansik
- License: mit
- Created: 2023-10-22T14:34:05.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-09T11:10:12.000Z (about 1 year ago)
- Last Synced: 2024-10-04T13:45:12.852Z (about 1 month ago)
- Topics: osc, speech-to-text, whisper
- Language: Python
- Homepage:
- Size: 141 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Speech to Text Whisper OSC
This application is designed to transcribe speech to text using the Whisper ASR (Automatic Speech Recognition) model and
send the transcribed text to an OSC (Open Sound Control) server. It provides various options to configure the
transcription process and OSC communication.![Example](images/tts-demo.gif)
### Installation
Create a virtual environment for python (version `>=3.8`) and install the requirements. Currently, the audio part only
supports MacOS and Windows.```
pip install -r requirements.txt
```Please check the [installation readme](https://people.csail.mit.edu/hubert/pyaudio/#downloads) for pyaudio, if you
experience problems with pyaudio.### Usage
Start the service by running the script.
```
python stt-service.py
```### Options
- `--model`: Specify the Whisper model to use. You can choose from "tiny," "base," "small," "medium," or "large."
- `--language`: Set the language code for speech decoding in ISO 639-1 format.
- `--backend`: The backend to be used for inference. You can choose
from "[openai](https://github.com/openai/whisper)", "[faster](https://github.com/guillaumekln/faster-whisper)"
and "[cpp](https://github.com/aarnphm/whispercpp)".- `--audio-device`: Audio device id (`int`). If none is provided, default is used.
- `--energy-threshold`: Define the energy level for the microphone to detect. Default is `1000`.
- `--record-timeout`: Set the real-time recording duration in seconds. Default is `0.5`.
- `--phrase-timeout`: Specify the empty space duration between recordings before the message is sent. Default is `1.2`.
- `--osc-server`: Provide the IP address of the OSC server. Default is `"127.0.0.1."`
- `--osc-port`: Set the OSC output port. Default is `8000`.
### OSC Addresses and Values
This application sends OSC messages to communicate the recognized speech. The OSC addresses used are:
- `/stt/partial-text`: OSC address for partial transcriptions. Messages sent when text is recognized in realtime.
1. `index`: An integer representing the index of the current sentence.
2. `timestamp`: An string representing the start timestamp of the current sentence (ISO 8601 - local-timezone).
3. `text`: A string containing the recognized partial text.- `/stt/text`: OSC address for full transcriptions. Messages sent when a complete sentence or phrase is recognized.
1. `index`: An integer representing the index of the current sentence.
2. `timestamp`: An string representing the start timestamp of the current sentence (ISO 8601 - local-timezone).
3. `text`: A string containing the recognized partial text.### About
The code is inspired
by [davabase/whisper_real_time](https://github.com/davabase/whisper_real_time/blob/master/transcribe_demo.py)