https://github.com/vinnyvangogh/cli-whisperer
🎤 Professional Voice-to-Text TUI Application - OpenAI Whisper + GPT with advanced recording controls, Spotify integration, and comprehensive export system
https://github.com/vinnyvangogh/cli-whisperer
ai openai python speech-recognition spotify textual transcription tui voice-to-text whisper
Last synced: about 1 month ago
JSON representation
🎤 Professional Voice-to-Text TUI Application - OpenAI Whisper + GPT with advanced recording controls, Spotify integration, and comprehensive export system
- Host: GitHub
- URL: https://github.com/vinnyvangogh/cli-whisperer
- Owner: VinnyVanGogh
- Created: 2025-07-16T18:47:57.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-07-16T21:39:07.000Z (3 months ago)
- Last Synced: 2025-08-28T20:56:56.059Z (about 1 month ago)
- Topics: ai, openai, python, speech-recognition, spotify, textual, transcription, tui, voice-to-text, whisper
- Language: Python
- Size: 107 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# CLI Whisperer





A professional **voice-to-text** terminal user interface (TUI) application that combines the power of OpenAI's Whisper for speech recognition with GPT for intelligent text formatting. Features a modern, responsive interface with comprehensive export capabilities, Spotify integration, and advanced recording controls.
## Features
### Audio & Recording
- **High-quality audio recording** with configurable duration (15s - 5min+)
- **Real-time audio level meter** with waveform visualization
- **Adjustable recording controls** with preset duration buttons
- **Graceful recording management** with manual stop capability
- **Minimum recording length validation** for quality assurance### AI-Powered Transcription
- **OpenAI Whisper integration** for accurate speech-to-text
- **Multiple Whisper model support** (tiny, base, small, medium, large)
- **Intelligent text formatting** with OpenAI GPT models
- **Dual transcription modes** - raw and AI-enhanced text
- **Comprehensive error handling** with fallback mechanisms### Modern TUI Interface
- **8 professional themes** (EDM Synthwave, Cyberpunk, Marc Anthony, Professional, etc.)
- **Responsive design** optimized for all terminal sizes
- **Tabbed interface** with smooth navigation
- **Real-time status updates** and progress indicators
- **Pulse animations** and visual feedback systems### Spotify Integration
- **Playback control** (play/pause, next/previous, shuffle, repeat)
- **Real-time status display** with track information
- **Interactive controls** directly in the TUI
- **Smart auto-pause** during recording sessions### Advanced Export System
- **6 export formats**: TXT, Markdown, JSON, CSV, DOCX, PDF
- **Batch export capabilities** for all transcriptions
- **Filtering options** by date, directory, and text content
- **Metadata inclusion** with timestamps and file paths
- **Custom output locations** and file naming### Comprehensive Keyboard Shortcuts
- **38 keyboard shortcuts** for all major functions
- **Power-user optimized** workflow
- **Intuitive key bindings** following standard conventions
- **Context-sensitive help** system### File Management
- **Intelligent file organization** with automatic rotation
- **History tracking** with searchable database
- **Directory-aware storage** with working directory tracking
- **Automatic cleanup** of old files
- **Backup and recovery** systems## Table of Contents
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Usage](#usage)
- [Keyboard Shortcuts](#keyboard-shortcuts)
- [Configuration](#configuration)
- [Export Functionality](#export-functionality)
- [Themes](#themes)
- [Development](#development)
- [API Reference](#api-reference)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
- [License](#license)## Installation
### Prerequisites
- **Python 3.10+** (required for OpenAI Whisper compatibility)
- **pip** or **uv** package manager
- **OpenAI API key** (optional, for text formatting)
- **Microphone** access for recording
- **Spotify CLI** (optional, for music integration)### Quick Install with UV (Recommended)
```bash
# Install with UV (fastest method)
uv pip install -e .# Or install from source
git clone https://github.com/VinnyVanGogh/cli-whisperer.git
cd cli-whisperer
uv pip install -e .
```### Install with Pip
```bash
# Clone the repository
git clone https://github.com/VinnyVanGogh/cli-whisperer.git
cd cli-whisperer# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate# Install dependencies
pip install -e .
```### System Dependencies
```bash
# macOS
brew install portaudio# Ubuntu/Debian
sudo apt-get install portaudio19-dev python3-pyaudio# Windows
# Install Visual Studio Build Tools
# PortAudio will be installed automatically
```## Quick Start
### 1. Basic Recording
```bash
# Start CLI Whisperer
cli-whisperer# Record for 2 minutes with OpenAI formatting
cli-whisperer --duration 120 --format# Record once and exit
cli-whisperer --once
```### 2. TUI Mode
```bash
# Launch the interactive TUI
cli-whisperer --tui# TUI with specific theme
cli-whisperer --tui --theme professional
```### 3. Configuration
```bash
# Set up OpenAI API key
export OPENAI_API_KEY="your-api-key-here"# Configure output directory
cli-whisperer --output-dir ~/Documents/transcripts
```## Usage
### Command Line Interface
```bash
cli-whisperer [OPTIONS]Options:
--tui Launch interactive TUI mode
--once Record once and exit
-d, --duration SECONDS Recording duration (default: 120)
-min, --minutes MIN Recording duration in minutes
--format Enable OpenAI text formatting
--no-format Disable OpenAI text formatting
--model MODEL Whisper model (tiny/base/small/medium/large)
--openai-model MODEL OpenAI model for formatting
--theme THEME TUI theme selection
--output-dir PATH Custom output directory
--cleanup-days DAYS Days to keep old files (default: 7)
--debug Enable debug logging
--help Show help message
```### TUI Mode Features
#### Recording Controls
- **Record Button**: Start recording session
- **Stop Button**: End recording early
- **Duration Controls**: Adjust recording time (±15s increments)
- **Preset Buttons**: Quick duration selection (30s, 1m, 2m, 5m)#### Real-time Feedback
- **Audio Level Meter**: Visual waveform with color coding
- **Progress Bar**: Recording countdown with time remaining
- **Status Panel**: Current mode and session information#### Text Management
- **Tabbed Previews**: Switch between raw and AI-formatted text
- **Copy Functions**: One-click copying to clipboard
- **Edit Integration**: Direct Neovim editing support## Keyboard Shortcuts
### Core Actions
| Key | Action | Description |
|-----|--------|-------------|
| `R` | Record | Start recording |
| `S` | Stop | Stop recording |
| `Space` | Toggle Recording | Start/stop recording |
| `Q` / `Escape` | Quit | Exit application |### Navigation
| Key | Action | Description |
|-----|--------|-------------|
| `Tab` / `Shift+Tab` | Navigate Tabs | Switch between tabs |
| `H` | History | Show history tab |
| `T` | Themes | Show themes tab |
| `F1` / `?` | Help | Show help dialog |### Duration Controls
| Key | Action | Description |
|-----|--------|-------------|
| `+` / `-` | Adjust Duration | Increase/decrease by 15s |
| `1` - `4` | Duration Presets | Set 30s, 1m, 2m, 5m |### Copy Operations
| Key | Action | Description |
|-----|--------|-------------|
| `C` | Copy AI Text | Copy formatted transcription |
| `Ctrl+C` | Copy Raw Text | Copy original transcription |
| `Ctrl+A` | Enhanced Copy | Copy with preview |
| `Ctrl+Shift+A` | Copy All | Copy all transcriptions |### Spotify Controls
| Key | Action | Description |
|-----|--------|-------------|
| `Ctrl+P` | Play/Pause | Toggle playback |
| `Ctrl+N` / `Ctrl+B` | Next/Previous | Track navigation |
| `Ctrl+S` | Toggle Panel | Show/hide Spotify panel |
| `Ctrl+Shift+S` | Shuffle | Toggle shuffle mode |
| `Ctrl+Shift+R` | Repeat | Toggle repeat mode |### File Operations
| Key | Action | Description |
|-----|--------|-------------|
| `Ctrl+E` | Export | Export current transcription |
| `Ctrl+Shift+E` | Export All | Export all transcriptions |
| `Ctrl+O` | Open Directory | Open transcript folder |
| `Ctrl+D` | Clean Files | Delete old files |### Advanced Features
| Key | Action | Description |
|-----|--------|-------------|
| `F2` | Toggle Debug | Enable/disable debug mode |
| `F3` | Toggle Audio Meter | Show/hide audio meter |
| `F4` | Compact Mode | Toggle compact layout |
| `F5` | Refresh | Refresh interface |
| `Ctrl+R` | Reload Config | Reload configuration |
| `Ctrl+Shift+T` | Switch Theme | Cycle through themes |## Configuration
### Environment Variables
```bash
# OpenAI Configuration
export OPENAI_API_KEY="sk-your-api-key-here"
export OPENAI_MODEL="gpt-4"# Application Settings
export CLI_WHISPERER_OUTPUT_DIR="~/Documents/transcripts"
export CLI_WHISPERER_THEME="professional"
export CLI_WHISPERER_DEBUG="false"# Recording Settings
export CLI_WHISPERER_DURATION="120"
export CLI_WHISPERER_MODEL="base"
export CLI_WHISPERER_MIN_LENGTH="1.0"
```### Configuration Files
The application uses the following configuration structure:
```
~/.config/cli-whisperer/
├── config.yaml # Main configuration
├── themes/ # Custom themes
│ ├── custom.css
│ └── user-theme.css
└── history/ # History database
├── history.json
└── backups/
```### Custom Themes
Create custom themes by extending the base theme system:
```css
/* ~/.config/cli-whisperer/themes/custom.css */
:root {
--primary-color: #your-color;
--secondary-color: #your-color;
--accent-color: #your-color;
--background-color: #your-color;
}RecordingControls {
background: var(--background-color);
border: solid var(--primary-color);
}
```## Export Functionality
### Supported Formats
| Format | Extension | Description | Metadata |
|--------|-----------|-------------|----------|
| **Plain Text** | `.txt` | Simple text format | Optional |
| **Markdown** | `.md` | Formatted with headers | Full |
| **JSON** | `.json` | Structured data | Complete |
| **CSV** | `.csv` | Spreadsheet compatible | Basic |
| **Word Document** | `.docx` | Microsoft Word | Full |
| **PDF** | `.pdf` | Portable document | Complete |### Export Options
#### Content Selection
- **Raw transcription text**
- **AI-formatted text**
- **Timestamps and metadata**
- **File paths and working directory**
- **Recording duration and model info**#### Filtering (History Export)
- **Date Range**: Export transcriptions from specific time periods
- **Directory Filter**: Export only from specific working directories
- **Text Search**: Export transcriptions containing specific keywords
- **Model Filter**: Export by Whisper model used#### Export Types
```bash
# Export latest transcription
Ctrl+E # Interactive format selection# Export current session
# Use Export Session button in Actions Panel# Export filtered history
Ctrl+Shift+E # Full export dialog with filtering
```## Themes
### Built-in Themes
| Theme | Description | Colors |
|-------|-------------|---------|
| **EDM Synthwave** | Retro neon aesthetic | Hot pink, electric cyan, yellow |
| **EDM Cyberpunk** | Futuristic dark theme | Cyan, green, deep pink |
| **EDM Trance** | Clean electronic look | Blue, purple, white |
| **Marc Anthony** | Elegant gold theme | Platinum, champagne, rose gold |
| **Professional** | Business-friendly | Blue, gray, green |
| **Dark Minimal** | Clean dark interface | White, gray, blue |
| **Neon Noir** | High contrast neon | Pink, cyan, yellow |
| **Retro Wave** | 80s inspired | Pink, purple, orange |### Theme Switching
```bash
# Command line
cli-whisperer --tui --theme professional# In TUI
T # Open themes tab
Ctrl+Shift+T # Quick theme cycle
```## Development
### Project Structure
```
cli-whisperer/
├── src/cli_whisperer/
│ ├── core/ # Core functionality
│ │ ├── audio_recorder.py # Audio recording and processing
│ │ ├── transcriber.py # Whisper integration
│ │ ├── formatter.py # OpenAI text formatting
│ │ └── file_manager.py # File operations
│ ├── integrations/ # External integrations
│ │ ├── spotify_control.py # Spotify API integration
│ │ └── clipboard.py # System clipboard
│ ├── ui/ # User interface
│ │ ├── textual_app.py # Main TUI application
│ │ ├── themes.py # Theme system
│ │ ├── export_dialog.py # Export dialogs
│ │ └── edit_manager.py # Neovim integration
│ ├── utils/ # Utilities
│ │ ├── config.py # Configuration management
│ │ ├── logger.py # Logging system
│ │ ├── history.py # History management
│ │ └── export_manager.py # Export functionality
│ ├── cli.py # CLI interface
│ └── main.py # Entry point
├── tests/ # Test suite
│ ├── test_export_manager.py
│ └── ...
├── pyproject.toml # Project configuration
└── README.md # This file
```### Development Setup
```bash
# Clone the repository
git clone https://github.com/VinnyVanGogh/cli-whisperer.git
cd cli-whisperer# Create development environment
python -m venv venv
source venv/bin/activate# Install in development mode
pip install -e ".[dev]"# Install pre-commit hooks
pre-commit install
```### Running Tests
```bash
# Run all tests
pytest# Run tests with coverage
pytest --cov=src/cli_whisperer# Run specific test file
pytest tests/test_export_manager.py# Run tests with verbose output
pytest -v
```### Code Quality
```bash
# Format code
black src/ tests/# Type checking
mypy src/cli_whisperer# Linting
flake8 src/ tests/# Run all quality checks
pre-commit run --all-files
```## API Reference
### Core Classes
#### `CLIApplication`
Main application orchestrator that coordinates all components.```python
from cli_whisperer.cli import CLIApplicationapp = CLIApplication(
duration=120,
format_enabled=True,
model="base",
output_dir="./transcripts"
)
app.run()
```#### `AudioRecorder`
Handles audio recording with real-time level monitoring.```python
from cli_whisperer.core.audio_recorder import AudioRecorderrecorder = AudioRecorder(
duration=60,
sample_rate=16000,
channels=1
)
audio_data = recorder.record()
```#### `WhisperTranscriber`
Manages Whisper model loading and transcription.```python
from cli_whisperer.core.transcriber import WhisperTranscribertranscriber = WhisperTranscriber(model="base")
text = transcriber.transcribe(audio_data)
```#### `ExportManager`
Handles multi-format export functionality.```python
from cli_whisperer.utils.export_manager import ExportManager, ExportFormatmanager = ExportManager()
manager.export_transcription(
text="Hello world",
format=ExportFormat.MARKDOWN,
output_path="output.md"
)
```### Integration Points
#### Spotify Integration
```python
from cli_whisperer.integrations.spotify_control import SpotifyControllerspotify = SpotifyController()
if spotify.is_available():
spotify.play()
status = spotify.get_status()
```#### Theme System
```python
from cli_whisperer.ui.themes import ThemeManagertheme_manager = ThemeManager()
theme_manager.set_theme("professional")
css = theme_manager.get_current_theme().css
```## Troubleshooting
### Common Issues
#### Audio Recording Problems
```bash
# Check microphone permissions
# macOS: System Preferences > Security & Privacy > Microphone
# Linux: Check PulseAudio/ALSA configuration# Test audio recording
python -c "import sounddevice as sd; print(sd.query_devices())"
```#### OpenAI API Issues
```bash
# Verify API key
echo $OPENAI_API_KEY# Test API connection
python -c "import openai; print(openai.models.list())"
```#### Whisper Model Loading
```bash
# Clear model cache
rm -rf ~/.cache/whisper# Download specific model
python -c "import whisper; whisper.load_model('base')"
```### Debug Mode
Enable debug logging for detailed troubleshooting:
```bash
# Command line
cli-whisperer --debug# Environment variable
export CLI_WHISPERER_DEBUG=true# In TUI
F2 # Toggle debug mode
```### Performance Optimization
#### For Low-End Systems
```bash
# Use smaller Whisper model
cli-whisperer --model tiny# Reduce recording duration
cli-whisperer --duration 30# Disable OpenAI formatting
cli-whisperer --no-format
```#### For High-End Systems
```bash
# Use larger Whisper model
cli-whisperer --model large# Enable all features
cli-whisperer --format --tui --theme professional
```### Log Files
Check log files for detailed error information:
```bash
# Application logs
tail -f ~/.local/share/cli-whisperer/logs/cli-whisperer.log# Debug logs (when debug mode enabled)
tail -f ~/.local/share/cli-whisperer/logs/debug.log
```## Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Process
1. **Fork the repository**
2. **Create a feature branch** (`git checkout -b feature/amazing-feature`)
3. **Make your changes** following the code style guidelines
4. **Add tests** for your changes
5. **Ensure all tests pass** (`pytest`)
6. **Update documentation** if needed
7. **Commit your changes** (`git commit -m 'Add amazing feature'`)
8. **Push to the branch** (`git push origin feature/amazing-feature`)
9. **Open a Pull Request**### Code Style Guidelines
- **Follow PEP 8** Python style guide
- **Use type hints** for all functions and methods
- **Write docstrings** in Google style
- **Keep functions under 50 lines** when possible
- **Maintain test coverage** above 90%### Issue Reports
When reporting issues, please include:
- **Python version** and operating system
- **Complete error messages** and stack traces
- **Steps to reproduce** the issue
- **Expected vs actual behavior**
- **Log files** if applicable## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- **OpenAI** for the Whisper and GPT models
- **Textual** for the excellent TUI framework
- **Python Community** for the amazing ecosystem
- **All contributors** who have helped improve this project## Support
- Email: [133192356+VinnyVanGogh@users.noreply.github.com]
- Issues: [GitHub Issues](https://github.com/VinnyVanGogh/cli-whisperer/issues)
- Documentation: [Project Wiki](https://github.com/VinnyVanGogh/cli-whisperer/wiki)---
**Made with ❤️ by VinnyVanGogh**
*Transforming voice to text with style and intelligence*[⬆️ Back to Top](#cli-whisperer)