https://github.com/cansik/speech-to-text-osc

Speech to text with OSC output.
https://github.com/cansik/speech-to-text-osc

osc speech-to-text whisper

Last synced: about 1 month ago
JSON representation

Speech to text with OSC output.

Host: GitHub
URL: https://github.com/cansik/speech-to-text-osc
Owner: cansik
License: mit
Created: 2023-10-22T14:34:05.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-11-09T11:10:12.000Z (over 1 year ago)
Last Synced: 2025-03-31T02:13:44.158Z (2 months ago)
Topics: osc, speech-to-text, whisper
Language: Python
Homepage:
Size: 141 KB
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Speech to Text Whisper OSC

This application is designed to transcribe speech to text using the Whisper ASR (Automatic Speech Recognition) model and
send the transcribed text to an OSC (Open Sound Control) server. It provides various options to configure the
transcription process and OSC communication.

![Example](images/tts-demo.gif)

### Installation

Create a virtual environment for python (version `>=3.8`) and install the requirements. Currently, the audio part only
supports MacOS and Windows.

```
pip install -r requirements.txt
```

Please check the [installation readme](https://people.csail.mit.edu/hubert/pyaudio/#downloads) for pyaudio, if you
experience problems with pyaudio.

### Usage

Start the service by running the script.

```
python stt-service.py
```

### Options

- `--model`: Specify the Whisper model to use. You can choose from "tiny," "base," "small," "medium," or "large."

- `--language`: Set the language code for speech decoding in ISO 639-1 format.

- `--backend`: The backend to be used for inference. You can choose
from "[openai](https://github.com/openai/whisper)", "[faster](https://github.com/guillaumekln/faster-whisper)"
and "[cpp](https://github.com/aarnphm/whispercpp)".

- `--audio-device`: Audio device id (`int`). If none is provided, default is used.

- `--energy-threshold`: Define the energy level for the microphone to detect. Default is `1000`.

- `--record-timeout`: Set the real-time recording duration in seconds. Default is `0.5`.

- `--phrase-timeout`: Specify the empty space duration between recordings before the message is sent. Default is `1.2`.

- `--osc-server`: Provide the IP address of the OSC server. Default is `"127.0.0.1."`

- `--osc-port`: Set the OSC output port. Default is `8000`.

### OSC Addresses and Values

This application sends OSC messages to communicate the recognized speech. The OSC addresses used are:

- `/stt/partial-text`: OSC address for partial transcriptions. Messages sent when text is recognized in realtime.
1. `index`: An integer representing the index of the current sentence.
2. `timestamp`: An string representing the start timestamp of the current sentence (ISO 8601 - local-timezone).
3. `text`: A string containing the recognized partial text.

- `/stt/text`: OSC address for full transcriptions. Messages sent when a complete sentence or phrase is recognized.
1. `index`: An integer representing the index of the current sentence.
2. `timestamp`: An string representing the start timestamp of the current sentence (ISO 8601 - local-timezone).
3. `text`: A string containing the recognized partial text.

### About

The code is inspired
by [davabase/whisper_real_time](https://github.com/davabase/whisper_real_time/blob/master/transcribe_demo.py)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cansik/speech-to-text-osc

Awesome Lists containing this project

README