https://github.com/link-/jarvis
Continuous audio transcription script that records audio from your device's microphone and transcribes it to text in real-time using OpenAI's Whisper model.
https://github.com/link-/jarvis
Last synced: 7 months ago
JSON representation
Continuous audio transcription script that records audio from your device's microphone and transcribes it to text in real-time using OpenAI's Whisper model.
- Host: GitHub
- URL: https://github.com/link-/jarvis
- Owner: Link-
- License: mit
- Created: 2025-05-25T10:55:05.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-05-25T10:55:48.000Z (8 months ago)
- Last Synced: 2025-06-22T23:52:08.061Z (7 months ago)
- Language: Python
- Size: 11.7 KB
- Stars: 5
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Jarvis Audio Transcription
A continuous audio transcription script that records audio from your device's microphone and transcribes it to text in real-time using OpenAI's Whisper model.
## Features
- **Continuous Recording**: Records audio from your device's microphone in chunks
- **Real-time Transcription**: Transcribes speech to text using OpenAI's Whisper model
- **Memory-efficient**: Processes audio in chunks with minimal memory overhead
- **Silence Detection**: Skips processing silent audio segments
- **Configurable**: Multiple command-line options for customization
- **Debug Mode**: Detailed logging for troubleshooting
## How It Works
1. Audio is continuously recorded in small chunks
2. Each chunk is analyzed for speech content
3. Speech segments are processed through the Whisper model
4. Transcribed text is printed to the console with timestamps
### Sequence Diagram
The following diagram illustrates the multi-threaded architecture and data flow of the transcription process:
```mermaid
sequenceDiagram
participant Main
participant RecordingThread
participant AudioQueue
participant TranscriptionThread
participant WhisperModel
participant Console
Note over Main: Initialize ContinuousTranscriber
Main->>Main: Load Whisper Model
Main->>RecordingThread: Start recording thread
Main->>TranscriptionThread: Start transcription thread
loop Recording Loop
RecordingThread->>RecordingThread: Record audio chunk (5s)
RecordingThread->>RecordingThread: Save overlap buffer (0.5s)
RecordingThread->>RecordingThread: Calculate RMS for silence detection
alt RMS > silence_threshold (Speech detected)
RecordingThread-->>AudioQueue: Push audio chunk + previous overlap
else RMS <= silence_threshold
RecordingThread->>RecordingThread: Skip silent audio (no processing)
end
end
loop Transcription Loop
AudioQueue-->>TranscriptionThread: Pop audio chunk when available
TranscriptionThread->>WhisperModel: Send audio for transcription
WhisperModel-->>TranscriptionThread: Return transcribed text
alt Transcription not empty
TranscriptionThread->>Console: Output text with timestamp
end
end
Note over Main: On Ctrl+C
Main->>RecordingThread: Stop recording
Main->>TranscriptionThread: Stop transcription
Main->>Main: Terminate PyAudio
```
## Requirements
- Python 3.7+
- PortAudio (for audio recording)
- FFmpeg (required by Whisper)
## Quick Start
### 1. Setup the Environment
```bash
# Clone the repository (if not already done)
# git clone
# cd
# Setup development environment (creates venv, installs dependencies)
make setup
```
### 2. List Available Audio Devices
```bash
make list-devices
```
### 3. Run the Transcription
```bash
# Run with default settings
make run
# Run with debug logging enabled
make run-debug
```
## Command-Line Options
The script supports various command-line arguments for customization:
```txt
--chunk-duration SEC Duration of each audio chunk in seconds (default: 5)
--sample-rate RATE Audio sample rate in Hz (default: 16000)
--channels N Number of audio channels (default: 1)
--overlap-duration SEC Overlap between chunks in seconds (default: 0.5)
--silence-threshold TH RMS threshold for silence detection (default: 0.01)
--device-index N Specific audio device index (default: system default)
--model SIZE Whisper model size (default: base)
Options: tiny, turbo, base, small, medium, large
--list-devices List available audio input devices and exit
--debug Enable debug mode for verbose logging
```
Example:
```bash
# Using a specific device and the small model
venv/bin/python record.py --device-index 2 --model small
# With custom audio settings
venv/bin/python record.py --chunk-duration 3 --sample-rate 44100 --overlap-duration 0.2
```
## Troubleshooting
If you encounter issues:
1. Use `--list-devices` to ensure the correct audio source is selected
2. Enable debug mode with `--debug` for detailed logging
3. Try different `--silence-threshold` values if speech is not being detected
4. Check that system dependencies (PortAudio, FFmpeg) are properly installed
5. Try a smaller model like "tiny" if transcription is slow
## License
[Specify your license here]