https://github.com/neuralwork/audio2chat
Convert multi-speaker audio files to structured chat data for LLMs
https://github.com/neuralwork/audio2chat
chat llm llm-datasets speaker-diarization transcription whisper
Last synced: 3 months ago
JSON representation
Convert multi-speaker audio files to structured chat data for LLMs
- Host: GitHub
- URL: https://github.com/neuralwork/audio2chat
- Owner: neuralwork
- License: mit
- Created: 2025-01-29T13:35:51.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-29T14:26:06.000Z (4 months ago)
- Last Synced: 2025-03-03T16:19:02.292Z (3 months ago)
- Topics: chat, llm, llm-datasets, speaker-diarization, transcription, whisper
- Language: Python
- Homepage: https://pypi.org/project/audio2chat/
- Size: 2.02 MB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Audio2Chat
Audio2Chat converts multi-speaker audio files into chat format using [AssemblyAI](https://www.assemblyai.com/app) for speaker diarization and optionally Whisper for enhanced transcription.
### Features
- Speaker diarization and transcription using AssemblyAI
- Optional enhanced transcription using Whisper large-v3-turbo
- YouTube video download support
- Word-level timestamp support (can be used for speech-to-text and text-to-speech tasks)
- Structured chat format output## Installation
```bash
# Install from PyPI
pip install audio2chat# Or install from source
git clone https://github.com/neuralwork/audio2chat.git
cd audio2chat
pip install -e .
```### Requirements
- Python >=3.8
- FFmpeg (for YouTube downloads)
- CUDA-capable GPU (recommended for Whisper)Install FFmpeg:
```bash
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg# MacOS
brew install ffmpeg# Windows (using Chocolatey)
choco install ffmpeg
```You need to have an Assembly AI account and an API key to use audio2chat. Once you setup an account, you can find the API key on your [dashboard](https://www.assemblyai.com/app).
## Usage
### Command Line
Basic usage:
```bash
# Process local audio file
audio2chat input.wav --api-key YOUR_ASSEMBLYAI_KEY --output output_dir# Process YouTube video
audio2chat "https://youtube.com/watch?v=xxxxx" --api-key YOUR_ASSEMBLYAI_KEY --output output_dir
```All options:
```bash
audio2chat --helprequired arguments:
input Input audio file path or YouTube URL
--api-key API_KEY AssemblyAI API keyoutput settings:
--output OUTPUT Output directory for audio and chat data (default: output)
--download-format {mp3,wav}
Audio format for YouTube downloads (default: wav)transcription settings:
--language LANGUAGE Language code for transcription (default: en)
--num-speakers NUM Expected number of speakers (default: auto-detect)
--use-whisper Use Whisper for enhanced transcription (default: False)chat generation settings:
--min-segment-confidence CONF
Minimum confidence score to include segment (default: 0.5)
--merge-threshold THRESH
Time threshold to merge adjacent utterances (default: 1.0)
--min-duration DUR Minimum duration for a chat segment (default: 0.5)
--include-metadata Include additional metadata in output (default: True)
--include-word-timestamps
Include word-level timing information (default: False)vocabulary settings:
--word-boost [WORDS ...]
List of words to boost recognition forother:
--verbose, -v Enable verbose logging
```### Python API
```python
from audio2chat.pipeline import AudioChatPipeline
from audio2chat.youtube_downloader import download_audio# For YouTube videos
audio_path = download_audio(
"https://youtube.com/watch?v=xxxxx",
output_dir="downloads",
audio_format="wav"
)# Initialize pipeline
pipeline = AudioChatPipeline(
api_key="YOUR_ASSEMBLYAI_KEY",
language="en",
num_speakers=2, # or None for auto-detect
use_whisper=True, # enable Whisper for better transcription
include_word_timestamps=True
)# Process file
chat_data = pipeline.process_file(audio_path, "output/chat.json")
```### Output Format
```json
{
"messages": [
{
"speaker": "A",
"text": "Hello there!",
"start": 0,
"end": 1500,
"words": [
{
"text": "Hello",
"start": 0,
"end": 750,
"confidence": 0.98
},
{
"text": "there",
"start": 750,
"end": 1500,
"confidence": 0.95
}
]
}
],
"metadata": {
"num_speakers": 2,
"speakers": ["A", "B"],
"transcription": "whisper+assemblyai"
}
}
```## Development
Run tests:
```bash
# Set up environment
export ASSEMBLYAI_API_KEY=your_key_here# Add test audio file
cp your_test_audio.wav tests/test_data/input.wav# Run tests
pytest tests/test_pipeline.py tests/test_chat_builder.py # without Whisper
pytest tests/ # all tests including Whisper
```## License
This project is licensed under the [MIT license](https://github.com/neuralwork/audio2chat/blob/main/LICENSE).From [neuralwork](https://neuralwork.ai/) with :heart: