https://github.com/voidful/codec-superb
Audio Codec Speech processing Universal PERformance Benchmark
https://github.com/voidful/codec-superb
audio audio-codec codec speech superb
Last synced: 5 months ago
JSON representation
Audio Codec Speech processing Universal PERformance Benchmark
- Host: GitHub
- URL: https://github.com/voidful/codec-superb
- Owner: voidful
- Created: 2023-10-19T12:06:14.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-06-30T15:02:02.000Z (7 months ago)
- Last Synced: 2025-06-30T15:47:49.314Z (7 months ago)
- Topics: audio, audio-codec, codec, speech, superb
- Language: Python
- Homepage: http://codecsuperb.eric-lam.com/
- Size: 3.29 MB
- Stars: 258
- Watchers: 11
- Forks: 25
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
# Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

Codec-SUPERB is a comprehensive benchmark designed to evaluate audio codec models across a variety of speech tasks. Our
goal is to facilitate community collaboration and accelerate advancements in the field of speech processing by
preserving and enhancing speech information quality.
## Table of Contents
- [Introduction](#introduction)
- [Key Features](#key-features)
- [Batch Processing](#batch-processing)
- [Installation](#installation)
- [Usage](#usage)
- [Single Audio Processing](#single-audio-processing)
- [Batch Audio Processing](#batch-audio-processing)
- [Performance Comparison](#performance-comparison)
- [Testing](#testing)
- [Contribution](#contribution)
- [License](#license)
## Introduction
Codec-SUPERB sets a new benchmark in evaluating sound codec models, providing a rigorous and transparent framework for
assessing performance across a range of speech processing tasks. Our goal is to foster innovation and set new standards
in audio quality and processing efficiency.
## Key Features
### Out-of-the-Box Codec Interface
Codec-SUPERB offers an intuitive, out-of-the-box codec interface that allows for easy integration and testing of various
codec models, facilitating quick iterations and experiments.
### Multi-Perspective Leaderboard
Codec-SUPERB's unique blend of multi-perspective evaluation and an online leaderboard drives innovation in sound codec
research by providing a comprehensive assessment and fostering competitive transparency among developers.
### Standardized Environment
We ensure a standardized testing environment to guarantee fair and consistent comparison across all models. This
uniformity brings reliability to benchmark results, making them universally interpretable.
### Unified Datasets
We provide a collection of unified datasets, curated to test a wide range of speech processing scenarios. This ensures
that models are evaluated under diverse conditions, reflecting real-world applications.
## Batch Processing
**🚀 NEW: Efficient Batch Processing Support**
Codec-SUPERB now supports efficient batch processing for encoding and decoding multiple audio samples simultaneously, eliminating the need for for loops and providing significant performance improvements.
### ✅ Key Benefits
- **3-5x faster processing** for multiple audio samples
- **GPU optimization** through vectorized operations
- **Automatic padding** for variable-length audio samples
- **Memory efficient** batch operations
- **Backward compatible** - existing code continues to work
### ✅ Supported Operations
- `batch_extract_unit()`: Extract units from multiple audio samples at once
- `batch_decode_unit()`: Decode multiple units back to audio at once
- `batch_synth()`: Complete synthesis pipeline for multiple samples
### ✅ All Codecs Supported
Every codec in Codec-SUPERB includes optimized batch processing:
- **EnCodec** (all variants): True tensor batching with automatic padding
- **SpeechTokenizer**: RVQ-aware batch processing
- **AudioDec**: Quantizer-optimized batch operations
- **HuggingFace EnCodec**: Native transformer batch processing
- **Descript Audio Codec**: Batch compression/decompression
- **SQCodec**: Feature-aware batch encoding
- **FunCodec**: AudioSignal batch handling
- **WavTokenizer**: Bandwidth-aware batch processing
- **AcademicCodec**: Acoustic token batch generation
## Installation
```bash
git clone https://github.com/voidful/Codec-SUPERB.git
cd Codec-SUPERB
pip install -r requirements.txt
```
## Usage
### [Leaderboard](https://codecsuperb.com)
### Single Audio Processing
Traditional single audio processing (still fully supported):
```python
from SoundCodec import codec
import torchaudio
# get all available codec
print(codec.list_codec())
# load codec by name, use encodec as example
encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')
# load audio
waveform, sample_rate = torchaudio.load('sample_audio.wav')
resampled_waveform = waveform.numpy()[-1]
data_item = {'audio': {'array': resampled_waveform,
'sampling_rate': sample_rate}}
# extract unit
sound_unit = encodec_24k_6bps.extract_unit(data_item).unit
# sound synthesis
decoded_waveform = encodec_24k_6bps.synth(data_item, local_save=False)['audio']['array']
```
### Batch Audio Processing
**🚀 NEW: Process multiple audio samples efficiently:**
```python
from SoundCodec import codec
import torchaudio
# load codec
encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')
# prepare multiple audio samples
audio_files = ['audio1.wav', 'audio2.wav', 'audio3.wav']
data_list = []
for audio_file in audio_files:
waveform, sample_rate = torchaudio.load(audio_file)
data_item = {
'id': audio_file,
'audio': {
'array': waveform.numpy()[0], # take first channel
'sampling_rate': sample_rate
}
}
data_list.append(data_item)
# OPTION 1: Batch extraction and decoding (recommended)
batch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)
print(f"Extracted {batch_extracted.batch_size} samples")
print(f"Unit shapes: {[unit.shape for unit in batch_extracted.units]}")
batch_decoded = encodec_24k_6bps.batch_decode_unit(batch_extracted)
print(f"Decoded audio shapes: {[audio.shape for audio in batch_decoded]}")
# OPTION 2: Complete batch synthesis pipeline
results = encodec_24k_6bps.batch_synth(data_list, local_save=False)
for i, result in enumerate(results):
print(f"Sample {i}: unit shape {result['unit'].shape}, "
f"audio shape {result['audio']['array'].shape}")
```
### Performance Comparison
Compare single vs batch processing performance:
```python
import time
# Single processing (old approach)
start_time = time.time()
single_results = []
for data in data_list:
extracted = encodec_24k_6bps.extract_unit(data)
decoded = encodec_24k_6bps.decode_unit(extracted.stuff_for_synth)
single_results.append(decoded)
single_time = time.time() - start_time
# Batch processing (new approach)
start_time = time.time()
batch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)
batch_results = encodec_24k_6bps.batch_decode_unit(batch_extracted)
batch_time = time.time() - start_time
print(f"Single processing: {single_time:.3f}s")
print(f"Batch processing: {batch_time:.3f}s")
print(f"Speedup: {single_time/batch_time:.2f}x")
```
### Advanced Batch Processing Tips
**Group samples by length for optimal performance:**
```python
# Group samples by similar lengths
short_samples = [data for data in data_list if len(data['audio']['array']) < 48000]
long_samples = [data for data in data_list if len(data['audio']['array']) >= 48000]
# Process each group separately for better efficiency
if short_samples:
short_results = encodec_24k_6bps.batch_extract_unit(short_samples)
if long_samples:
long_results = encodec_24k_6bps.batch_extract_unit(long_samples)
```
**Process large datasets in chunks:**
```python
def process_large_dataset(codec, data_list, batch_size=8):
all_results = []
for i in range(0, len(data_list), batch_size):
batch = data_list[i:i+batch_size]
batch_results = codec.batch_synth(batch, local_save=False)
all_results.extend(batch_results)
return all_results
# Process large dataset efficiently
large_results = process_large_dataset(encodec_24k_6bps, large_data_list)
```
## Testing
Run the test suite to verify codec functionality:
```bash
# Run all tests
python -m pytest SoundCodec/test/
# Run batch processing tests specifically
python -m pytest SoundCodec/test/test_batch_processing.py -v
# Run performance benchmarks
python SoundCodec/test/benchmark_batch_performance.py
```
## Citation
If you use this code or result in your paper, please cite our work as:
```Tex
@article{wu2024codec,
title={Codec-superb: An in-depth analysis of sound codec models},
author={Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander H and Lee, Hung-yi},
journal={arXiv preprint arXiv:2402.13071},
year={2024}
}
```
```Tex
@article{wu2024towards,
title={Towards audio language modeling-an overview},
author={Wu, Haibin and Chen, Xuanjun and Lin, Yi-Cheng and Chang, Kai-wei and Chung, Ho-Lam and Liu, Alexander H and Lee, Hung-yi},
journal={arXiv preprint arXiv:2402.13236},
year={2024}
}
```
```Tex
@inproceedings{wu-etal-2024-codec,
title = "Codec-{SUPERB}: An In-Depth Analysis of Sound Codec Models",
author = "Wu, Haibin and
Chung, Ho-Lam and
Lin, Yi-Cheng and
Wu, Yuan-Kuei and
Chen, Xuanjun and
Pai, Yu-Chi and
Wang, Hsiu-Hsuan and
Chang, Kai-Wei and
Liu, Alexander and
Lee, Hung-yi",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.616",
doi = "10.18653/v1/2024.findings-acl.616",
pages = "10330--10348",
}
```
## Contribution
Contributions are highly encouraged, whether it's through adding new codec models, expanding the dataset collection, or
enhancing the benchmarking framework. Please see `CONTRIBUTING.md` for more details.
## License
This project is licensed under the MIT License - see the `LICENSE` file for details.
## Reference Sound Codec Repositories:
- https://github.com/ZhangXInFD/SpeechTokenizer
- https://github.com/descriptinc/descript-audio-codec
- https://github.com/facebookresearch/encodec
- https://github.com/yangdongchao/AcademiCodec
- https://github.com/facebookresearch/AudioDec
- https://github.com/alibaba-damo-academy/FunCodec