https://github.com/chrischoy/WhisperChain
Speech to Text but will all the bells and whistles and most importantly AI! AI will clean up your filler words, edit, refine what you said!
https://github.com/chrischoy/WhisperChain
Last synced: about 1 month ago
JSON representation
Speech to Text but will all the bells and whistles and most importantly AI! AI will clean up your filler words, edit, refine what you said!
- Host: GitHub
- URL: https://github.com/chrischoy/WhisperChain
- Owner: chrischoy
- License: bsd-3-clause
- Created: 2025-02-02T19:39:00.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-02-09T20:42:51.000Z (3 months ago)
- Last Synced: 2025-03-12T07:32:36.503Z (about 2 months ago)
- Language: Python
- Size: 426 KB
- Stars: 255
- Watchers: 3
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-hacking-lists - chrischoy/WhisperChain - Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what you said! (Python)
README
# Whisper Chain
![]()
## Overview
Typing is boring, let's use voice to speed up your workflow. This project combines:
- Real-time speech recognition using Whisper.cpp
- Transcription cleanup using LangChain
- Global hotkey support for voice control
- Automatic clipboard integration for the cleaned transcription## Requirements
- Python 3.8+
- OpenAI API Key
- For MacOS:
- ffmpeg (for audio processing)
- portaudio (for audio capture)## Installation
1. Install system dependencies (MacOS):
```bash
# Install ffmpeg and portaudio using Homebrew
brew install ffmpeg portaudio
```2. Install the project:
```bash
pip install whisperchain
```## Configuration
WhisperChain will look for configuration in the following locations:
1. Environment variables
2. .env file in the current directory
3. ~/.whisperchain/.env fileOn first run, if no configuration is found, you will be prompted to enter your OpenAI API key. The key will be saved in `~/.whisperchain/.env` for future use.
You can also manually set your OpenAI API key in any of these ways:
```bash
# Option 1: Environment variable
export OPENAI_API_KEY=your-api-key-here# Option 2: Create .env file in current directory
echo "OPENAI_API_KEY=your-api-key-here" > .env# Option 3: Create global config
mkdir -p ~/.whisperchain
echo "OPENAI_API_KEY=your-api-key-here" > ~/.whisperchain/.env
```## Usage
1. Start the application:
```bash
# Run with default settings
whisperchain# Run with custom configuration
whisperchain --config config.json# Override specific settings
whisperchain --port 8080 --hotkey "++t" --model "large" --debug
```3. Use the global hotkey (`++r` by default. `++r` on MacOS):
- Press and hold to start recording
- Speak your text
- Release to stop recording
- The cleaned transcription will be copied to your clipboard automatically
- Paste (Ctrl+V) to paste the transcription## Development
### Streamlit UI
```bash
streamlit run src/whisperchain/ui/streamlit_app.py
```If there is an error in the Streamlit UI, you can run the following command to kill all running Streamlit processes:
```bash
lsof -ti :8501 | xargs kill -9
```### Running Tests
Install test dependencies:
```bash
pip install -e ".[test]"
```Run tests:
```bash
pytest tests/
```Run tests with microphone input:
```bash
# Run specific microphone test
TEST_WITH_MIC=1 pytest tests/test_stream_client.py -v -k test_stream_client_with_real_mic# Run all tests including microphone test
TEST_WITH_MIC=1 pytest tests/
```### Building the project
```bash
python -m build
pip install .
```### Publishing to PyPI
```bash
python -m build
twine upload --repository pypi dist/*
```## License
[LICENSE](LICENSE)
## Acknowledgments
- [Whisper.cpp](https://github.com/ggerganov/whisper.cpp)
- [pywhispercpp](https://github.com/absadiki/pywhispercpp.git)
- [LangChain](https://github.com/langchain-ai/langchain)## Architecture
```mermaid
graph TB
subgraph "Client Options"
K[Key Listener]
A[Audio Stream]
C[Clipboard]
endsubgraph "Streamlit Web UI :8501"
WebP[Prompt]
WebH[History]
endsubgraph "FastAPI Server :8000"
WS[WebSocket /stream]
W[Whisper Model]
LC[LangChain Processor]
H[History]
endK -->|"Hot Key"| A
A -->|"Audio Stream"| WS
WS --> W
W --> LC
WebP --> LC
LC --> C
LC --> H
H --> WebH
```