https://github.com/assemblyai-solutions/async-chunk-py
https://github.com/assemblyai-solutions/async-chunk-py
Last synced: over 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/assemblyai-solutions/async-chunk-py
- Owner: AssemblyAI-Solutions
- Created: 2024-09-19T17:25:10.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-14T20:23:52.000Z (over 1 year ago)
- Last Synced: 2025-01-24T04:53:31.880Z (over 1 year ago)
- Language: Python
- Size: 33.2 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# AsyncChunkPy: Near-Realtime Python Speech-to-Text App
AsyncChunkPy is a Python application that provides near-realtime speech-to-text transcription using chunked audio processing and asynchronous transcription. It leverages the power of AssemblyAI's async transcription API to deliver high-quality transcriptions at near real-time speeds.
## Features
- Real-time audio recording and chunking
- Voice Activity Detection (VAD) for intelligent chunk processing
- Asynchronous transcription using AssemblyAI API
- Ordered transcript logging
- Configurable chunk size and silence threshold
- Support for multiple languages
## Key Benefits
- Access to AssemblyAI's powerful Universal-2 model for English and Universal-1 model for Spanish and German
- Support for all non-English languages available in AssemblyAI's async transcription service
- Higher accuracy compared to real-time transcription models
- More cost-effective than real-time transcription services
- Near real-time performance with the quality of async transcription
## Prerequisites
- [Python](https://www.python.org/) 3.7 or later
- [pip](https://pip.pypa.io/en/stable/installation/) (Python package installer)
- AssemblyAI API key (You can [sign up for an AssemblyAI account](https://www.assemblyai.com/app) and get your API key from your dashboard.)
## Installation
1. Clone the repository:
```
git clone https://github.com/AssemblyAI-Solutions/async-chunk-py.git
cd async-chunk-py
```
2. Install dependencies:
```
pip install -r requirements.txt
```
3. Create a `.env` file in the root directory and add your AssemblyAI API key:
```
ASSEMBLYAI_API_KEY=your_api_key_here
```
## Usage
1. Start the application:
```
python main.py
```
2. Speak into your microphone. The application will record and transcribe your speech in near-realtime.
3. Press Ctrl+C to stop the recording and see the final transcript.
## Configuration
You can modify the following parameters in `config.py`:
- `CHUNK_SIZE`: Size of each audio chunk in bytes
- `CHUNK_DURATION_MS`: Duration of each audio chunk in milliseconds (default: 5000ms)
- `SILENCE_THRESHOLD_MS`: Duration of silence required to trigger chunk processing (default: 600ms)
To change the language or enable language detection, modify the `transcription_worker.py` file:
- Set `language_code='en'` to the desired language code in the `transcribe` method, or
- Add `language_detection=True` to enable automatic language detection
## Voice Activity Detection (VAD)
This project uses the py-webrtcvad library for VAD. You can adjust VAD parameters by modifying the `Vad` configuration in `audio_recorder.py`. For more information on VAD parameters, visit the [py-webrtcvad GitHub repository](https://github.com/wiseman/py-webrtcvad).
## Project Structure
- `main.py`: Main application file handling coordination between audio recording and transcription.
- `audio_recorder.py`: Handles audio recording and Voice Activity Detection.
- `transcription_worker.py`: Worker for handling transcription tasks using AssemblyAI API.
- `config.py`: Configuration file for various parameters.
## Acknowledgments
- [py-webrtcvad](https://github.com/wiseman/py-webrtcvad) for the Voice Activity Detection functionality
## Troubleshooting
If you encounter any issues with audio recording or transcription, ensure that:
1. Your microphone is properly connected and selected as the input device.
2. Your AssemblyAI API key is correctly set in the `.env` file.
3. You have a stable internet connection for API communication.
For any other issues, please check the console output for error messages and refer to the documentation of the individual dependencies if needed.