Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/j3k0/my-speech-recognition

An easy-to-use macOS application **and command-line tool** that integrates with Groq's API for speech recognition. Simply press a shortcut key, start speaking, release it, and the transcribed text will be pasted directly into your active application.
https://github.com/j3k0/my-speech-recognition

Last synced: 19 days ago
JSON representation

Host: GitHub
URL: https://github.com/j3k0/my-speech-recognition
Owner: j3k0
Created: 2024-09-16T14:16:07.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-01-16T11:36:13.000Z (about 1 month ago)
Last Synced: 2025-01-16T12:58:05.886Z (about 1 month ago)
Language: Python
Size: 51.8 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# My Speech Recognition for macOS

## Features

- **Seamless Integration**: Works with any macOS application that accepts text input.
- **Shortcut Activation**: Use a keyboard shortcut to start recording.
- **Real-Time Transcription**: Utilizes Groq's Whisper models for fast and accurate speech-to-text conversion.
- **Contextual Awareness**: Optionally retrieve context from the active text box to improve transcription accuracy.
- **Customizable**: Adjust settings like model selection, verbosity, and initial prompts.
- **Command-Line Interface**: Transcribe audio files or record from the microphone directly via CLI.

## Requirements

- **Operating System**: macOS
- **Python**: 3.7 or higher
- **Dependencies**:
- `pyaudio`
- `webrtcvad`
- `pyobjc`
- `pyperclip`
- `pynput`
- `requests`
- `numpy`
- **Additional Tools**:
- **FFmpeg**: Install separately.
- **Groq API Key**: Obtain from Groq.

## Installation

1. **Clone the Repository**:

```bash
git clone https://github.com/j3k0/my-speech-recognition.git
cd my-speech-recognition
```

2. **Set Up Virtual Environment** *(Optional but recommended)*:

```bash
python3 -m venv venv
source venv/bin/activate
```

3. **Install Dependencies**:

```bash
pip install -r requirements.txt
```

4. **Install FFmpeg**:

Install via Homebrew:

```bash
brew install ffmpeg
```

5. **Set the Groq API Key**:

Export your API key as an environment variable:

```bash
export GROQ_API_KEY='your_api_key_here'
```

*Alternatively, you can add your API key to a `.env` file.*

## Usage

### macOS Application

Run the main service script:
```bash
python myspeech_service.py
```

> **Important**: Your terminal application (Terminal.app or iTerm) needs Accessibility permissions to capture keyboard events. Go to System Settings > Privacy & Security > Accessibility and add your terminal application.

This will start the application in the background. Press **Control+V** to start recording. will appear while the recording takes place, until a silence is detected. The transcribed text will be pasted into your active application.

**Available Options** for `myspeech_service.py`:

```text
usage: myspeech_service.py [-h] [--model MODEL] [--verbose] [--initial-prompt INITIAL_PROMPT] [--retrieve-context]

Optional arguments:
-h, --help show this help message and exit
--model MODEL Name of the model to use
--verbose Enable verbose output
--initial-prompt INITIAL_PROMPT Initial prompt to include in transcription
--retrieve-context Retrieve context from active text box
```

#### Examples

- **Run with Default Settings:**

```bash
python myspeech_service.py
```

- **Specify a Different Model and Enable Verbose Output:**

```bash
python myspeech_service.py --model distil-whisper-large-v3-en --verbose
```

- **Use an Initial Prompt:**

```bash
python myspeech_service.py --initial-prompt "The meeting notes are as follows:"
```

- **Retrieve Context from Active Text Box:**

```bash
python myspeech_service.py --retrieve-context
```

### Command-Line Interface (CLI) Tool

You can also use the CLI tool to transcribe audio files or record from the microphone.

#### Transcribe Audio Files

```bash
python myspeech.py audio1.wav audio2.mp3
```

#### Record from Microphone

Record and transcribe audio from your microphone:

```bash
python myspeech.py --record
```

#### Available Options

```text
usage: myspeech.py [-h] [--model MODEL] [--language LANGUAGE]
[--output_dir OUTPUT_DIR] [--temperature TEMPERATURE]
[--record]
[--output_format {txt,vtt,srt,tsv,json,all}]
[--task {transcribe,translate}] [--word_timestamps]
[--initial_prompt INITIAL_PROMPT] [--verbose]
[audio [audio ...]]

Whisper-like CLI using Groq API

positional arguments:
audio audio file(s) to transcribe

optional arguments:
-h, --help show this help message and exit
--model MODEL name of the Whisper model to use
--language LANGUAGE language spoken in the audio
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
directory to save the outputs
--temperature TEMPERATURE
temperature to use for sampling
--record record audio from microphone until silence is detected
--output_format {txt,vtt,srt,tsv,json,all}, -f {txt,vtt,srt,tsv,json,all}
format of the output file; default is 'all'
--task {transcribe,translate}
perform transcription or translation
--word_timestamps extract word-level timestamps
--initial_prompt INITIAL_PROMPT
initial prompt for the first window
--verbose print progress and debug messages
```

#### Examples

- **Transcribe Multiple Audio Files:**

```bash
python myspeech.py audio1.wav audio2.mp3
```

- **Record and Transcribe from Microphone:**

```bash
python myspeech.py --record
```

- **Specify a Different Model:**

```bash
python myspeech.py --model distil-whisper-large-v3-en audio.wav
```

- **Set Language:**

```bash
python myspeech.py --language en audio.wav
```

- **Choose Output Directory and Format:**

```bash
python myspeech.py -o transcripts/ -f txt audio.wav
```

- **Translate Audio to English:**

```bash
python myspeech.py --task translate audio.wav
```

- **Enable Verbose Output:**

```bash
python myspeech.py --verbose audio.wav
```

## License

[MIT License](LICENSE)