An open API service indexing awesome lists of open source software.

https://github.com/chrischoy/WhisperChain

Speech to Text but will all the bells and whistles and most importantly AI! AI will clean up your filler words, edit, refine what you said!
https://github.com/chrischoy/WhisperChain

Last synced: about 1 month ago
JSON representation

Speech to Text but will all the bells and whistles and most importantly AI! AI will clean up your filler words, edit, refine what you said!

Awesome Lists containing this project

README

        

# Whisper Chain


Whisper Chain Logo

## Overview

Typing is boring, let's use voice to speed up your workflow. This project combines:
- Real-time speech recognition using Whisper.cpp
- Transcription cleanup using LangChain
- Global hotkey support for voice control
- Automatic clipboard integration for the cleaned transcription

## Requirements

- Python 3.8+
- OpenAI API Key
- For MacOS:
- ffmpeg (for audio processing)
- portaudio (for audio capture)

## Installation

1. Install system dependencies (MacOS):
```bash
# Install ffmpeg and portaudio using Homebrew
brew install ffmpeg portaudio
```

2. Install the project:
```bash
pip install whisperchain
```

## Configuration

WhisperChain will look for configuration in the following locations:
1. Environment variables
2. .env file in the current directory
3. ~/.whisperchain/.env file

On first run, if no configuration is found, you will be prompted to enter your OpenAI API key. The key will be saved in `~/.whisperchain/.env` for future use.

You can also manually set your OpenAI API key in any of these ways:
```bash
# Option 1: Environment variable
export OPENAI_API_KEY=your-api-key-here

# Option 2: Create .env file in current directory
echo "OPENAI_API_KEY=your-api-key-here" > .env

# Option 3: Create global config
mkdir -p ~/.whisperchain
echo "OPENAI_API_KEY=your-api-key-here" > ~/.whisperchain/.env
```

## Usage

1. Start the application:
```bash
# Run with default settings
whisperchain

# Run with custom configuration
whisperchain --config config.json

# Override specific settings
whisperchain --port 8080 --hotkey "++t" --model "large" --debug
```

3. Use the global hotkey (`++r` by default. `++r` on MacOS):
- Press and hold to start recording
- Speak your text
- Release to stop recording
- The cleaned transcription will be copied to your clipboard automatically
- Paste (Ctrl+V) to paste the transcription

## Development

### Streamlit UI

```bash
streamlit run src/whisperchain/ui/streamlit_app.py
```

If there is an error in the Streamlit UI, you can run the following command to kill all running Streamlit processes:

```bash
lsof -ti :8501 | xargs kill -9
```

### Running Tests

Install test dependencies:
```bash
pip install -e ".[test]"
```

Run tests:
```bash
pytest tests/
```

Run tests with microphone input:
```bash
# Run specific microphone test
TEST_WITH_MIC=1 pytest tests/test_stream_client.py -v -k test_stream_client_with_real_mic

# Run all tests including microphone test
TEST_WITH_MIC=1 pytest tests/
```

### Building the project

```bash
python -m build
pip install .
```

### Publishing to PyPI

```bash
python -m build
twine upload --repository pypi dist/*
```

## License

[LICENSE](LICENSE)

## Acknowledgments

- [Whisper.cpp](https://github.com/ggerganov/whisper.cpp)
- [pywhispercpp](https://github.com/absadiki/pywhispercpp.git)
- [LangChain](https://github.com/langchain-ai/langchain)

## Architecture

```mermaid
graph TB
subgraph "Client Options"
K[Key Listener]
A[Audio Stream]
C[Clipboard]
end

subgraph "Streamlit Web UI :8501"
WebP[Prompt]
WebH[History]
end

subgraph "FastAPI Server :8000"
WS[WebSocket /stream]
W[Whisper Model]
LC[LangChain Processor]
H[History]
end

K -->|"Hot Key"| A
A -->|"Audio Stream"| WS
WS --> W
W --> LC
WebP --> LC
LC --> C
LC --> H
H --> WebH
```