https://github.com/voidful/codec-superb

Audio Codec Speech processing Universal PERformance Benchmark
https://github.com/voidful/codec-superb
audio audio-codec codec speech superb
Last synced: 6 months ago
JSON representation
Audio Codec Speech processing Universal PERformance Benchmark
Host: GitHub
URL: https://github.com/voidful/codec-superb
Owner: voidful
Created: 2023-10-19T12:06:14.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-06-30T15:02:02.000Z (9 months ago)
Last Synced: 2025-06-30T15:47:49.314Z (9 months ago)
Topics: audio, audio-codec, codec, speech, superb
Language: Python
Homepage: http://codecsuperb.eric-lam.com/
Size: 3.29 MB
Stars: 258
Watchers: 11
Forks: 25
Open Issues: 4
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project

README

          # Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

![Overview](img/Overview.png)

Codec-SUPERB is a comprehensive benchmark designed to evaluate audio codec models across a variety of speech tasks. Our

goal is to facilitate community collaboration and accelerate advancements in the field of speech processing by

preserving and enhancing speech information quality.

  

## Table of Contents

- [Introduction](#introduction)

- [Key Features](#key-features)

- [Batch Processing](#batch-processing)

- [Installation](#installation)

- [Usage](#usage)

  - [Single Audio Processing](#single-audio-processing)

  - [Batch Audio Processing](#batch-audio-processing)

  - [Performance Comparison](#performance-comparison)

- [Testing](#testing)

- [Contribution](#contribution)

- [License](#license)

## Introduction

Codec-SUPERB sets a new benchmark in evaluating sound codec models, providing a rigorous and transparent framework for

assessing performance across a range of speech processing tasks. Our goal is to foster innovation and set new standards

in audio quality and processing efficiency.

## Key Features

### Out-of-the-Box Codec Interface

Codec-SUPERB offers an intuitive, out-of-the-box codec interface that allows for easy integration and testing of various

codec models, facilitating quick iterations and experiments.

### Multi-Perspective Leaderboard

Codec-SUPERB's unique blend of multi-perspective evaluation and an online leaderboard drives innovation in sound codec

research by providing a comprehensive assessment and fostering competitive transparency among developers.

### Standardized Environment

We ensure a standardized testing environment to guarantee fair and consistent comparison across all models. This

uniformity brings reliability to benchmark results, making them universally interpretable.

### Unified Datasets

We provide a collection of unified datasets, curated to test a wide range of speech processing scenarios. This ensures

that models are evaluated under diverse conditions, reflecting real-world applications.

## Batch Processing

**🚀 NEW: Efficient Batch Processing Support**

Codec-SUPERB now supports efficient batch processing for encoding and decoding multiple audio samples simultaneously, eliminating the need for for loops and providing significant performance improvements.

### ✅ Key Benefits

- **3-5x faster processing** for multiple audio samples

- **GPU optimization** through vectorized operations

- **Automatic padding** for variable-length audio samples

- **Memory efficient** batch operations

- **Backward compatible** - existing code continues to work

### ✅ Supported Operations

- `batch_extract_unit()`: Extract units from multiple audio samples at once

- `batch_decode_unit()`: Decode multiple units back to audio at once  

- `batch_synth()`: Complete synthesis pipeline for multiple samples

### ✅ All Codecs Supported

Every codec in Codec-SUPERB includes optimized batch processing:

- **EnCodec** (all variants): True tensor batching with automatic padding

- **SpeechTokenizer**: RVQ-aware batch processing  

- **AudioDec**: Quantizer-optimized batch operations

- **HuggingFace EnCodec**: Native transformer batch processing

- **Descript Audio Codec**: Batch compression/decompression

- **SQCodec**: Feature-aware batch encoding

- **FunCodec**: AudioSignal batch handling

- **WavTokenizer**: Bandwidth-aware batch processing

- **AcademicCodec**: Acoustic token batch generation

## Installation

```bash

git clone https://github.com/voidful/Codec-SUPERB.git

cd Codec-SUPERB

pip install -r requirements.txt

```

## Usage

### [Leaderboard](https://codecsuperb.com)

### Single Audio Processing

Traditional single audio processing (still fully supported):

```python

from SoundCodec import codec

import torchaudio

# get all available codec

print(codec.list_codec())

# load codec by name, use encodec as example

encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')

# load audio

waveform, sample_rate = torchaudio.load('sample_audio.wav')

resampled_waveform = waveform.numpy()[-1]

data_item = {'audio': {'array': resampled_waveform,

                       'sampling_rate': sample_rate}}

# extract unit

sound_unit = encodec_24k_6bps.extract_unit(data_item).unit

# sound synthesis

decoded_waveform = encodec_24k_6bps.synth(data_item, local_save=False)['audio']['array']

```

### Batch Audio Processing

**🚀 NEW: Process multiple audio samples efficiently:**

```python

from SoundCodec import codec

import torchaudio

# load codec

encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')

# prepare multiple audio samples

audio_files = ['audio1.wav', 'audio2.wav', 'audio3.wav']

data_list = []

for audio_file in audio_files:

    waveform, sample_rate = torchaudio.load(audio_file)

    data_item = {

        'id': audio_file,

        'audio': {

            'array': waveform.numpy()[0],  # take first channel

            'sampling_rate': sample_rate

        }

    }

    data_list.append(data_item)

# OPTION 1: Batch extraction and decoding (recommended)

batch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)

print(f"Extracted {batch_extracted.batch_size} samples")

print(f"Unit shapes: {[unit.shape for unit in batch_extracted.units]}")

batch_decoded = encodec_24k_6bps.batch_decode_unit(batch_extracted)

print(f"Decoded audio shapes: {[audio.shape for audio in batch_decoded]}")

# OPTION 2: Complete batch synthesis pipeline

results = encodec_24k_6bps.batch_synth(data_list, local_save=False)

for i, result in enumerate(results):

    print(f"Sample {i}: unit shape {result['unit'].shape}, "

          f"audio shape {result['audio']['array'].shape}")

```

### Performance Comparison

Compare single vs batch processing performance:

```python

import time

# Single processing (old approach)

start_time = time.time()

single_results = []

for data in data_list:

    extracted = encodec_24k_6bps.extract_unit(data)

    decoded = encodec_24k_6bps.decode_unit(extracted.stuff_for_synth)

    single_results.append(decoded)

single_time = time.time() - start_time

# Batch processing (new approach)  

start_time = time.time()

batch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)

batch_results = encodec_24k_6bps.batch_decode_unit(batch_extracted)

batch_time = time.time() - start_time

print(f"Single processing: {single_time:.3f}s")

print(f"Batch processing: {batch_time:.3f}s") 

print(f"Speedup: {single_time/batch_time:.2f}x")

```

### Advanced Batch Processing Tips

**Group samples by length for optimal performance:**

```python

# Group samples by similar lengths

short_samples = [data for data in data_list if len(data['audio']['array']) < 48000]

long_samples = [data for data in data_list if len(data['audio']['array']) >= 48000]

# Process each group separately for better efficiency

if short_samples:

    short_results = encodec_24k_6bps.batch_extract_unit(short_samples)

if long_samples:

    long_results = encodec_24k_6bps.batch_extract_unit(long_samples)

```

**Process large datasets in chunks:**

```python

def process_large_dataset(codec, data_list, batch_size=8):

    all_results = []

    for i in range(0, len(data_list), batch_size):

        batch = data_list[i:i+batch_size]

        batch_results = codec.batch_synth(batch, local_save=False)

        all_results.extend(batch_results)

    return all_results

# Process large dataset efficiently

large_results = process_large_dataset(encodec_24k_6bps, large_data_list)

```

## Testing

Run the test suite to verify codec functionality:

```bash

# Run all tests

python -m pytest SoundCodec/test/

# Run batch processing tests specifically

python -m pytest SoundCodec/test/test_batch_processing.py -v

# Run performance benchmarks

python SoundCodec/test/benchmark_batch_performance.py

```

## Citation

If you use this code or result in your paper, please cite our work as:

```Tex

@article{wu2024codec,

  title={Codec-superb: An in-depth analysis of sound codec models},

  author={Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander H and Lee, Hung-yi},

  journal={arXiv preprint arXiv:2402.13071},

  year={2024}

}

```

```Tex

@article{wu2024towards,

  title={Towards audio language modeling-an overview},

  author={Wu, Haibin and Chen, Xuanjun and Lin, Yi-Cheng and Chang, Kai-wei and Chung, Ho-Lam and Liu, Alexander H and Lee, Hung-yi},

  journal={arXiv preprint arXiv:2402.13236},

  year={2024}

}

```

```Tex

@inproceedings{wu-etal-2024-codec,

    title = "Codec-{SUPERB}: An In-Depth Analysis of Sound Codec Models",

    author = "Wu, Haibin  and

      Chung, Ho-Lam  and

      Lin, Yi-Cheng  and

      Wu, Yuan-Kuei  and

      Chen, Xuanjun  and

      Pai, Yu-Chi  and

      Wang, Hsiu-Hsuan  and

      Chang, Kai-Wei  and

      Liu, Alexander  and

      Lee, Hung-yi",

    editor = "Ku, Lun-Wei  and

      Martins, Andre  and

      Srikumar, Vivek",

    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",

    month = aug,

    year = "2024",

    address = "Bangkok, Thailand",

    publisher = "Association for Computational Linguistics",

    url = "https://aclanthology.org/2024.findings-acl.616",

    doi = "10.18653/v1/2024.findings-acl.616",

    pages = "10330--10348",

}

```

## Contribution

Contributions are highly encouraged, whether it's through adding new codec models, expanding the dataset collection, or

enhancing the benchmarking framework. Please see `CONTRIBUTING.md` for more details.

## License

This project is licensed under the MIT License - see the `LICENSE` file for details.

## Reference Sound Codec Repositories：

- https://github.com/ZhangXInFD/SpeechTokenizer

- https://github.com/descriptinc/descript-audio-codec

- https://github.com/facebookresearch/encodec

- https://github.com/yangdongchao/AcademiCodec

- https://github.com/facebookresearch/AudioDec

- https://github.com/alibaba-damo-academy/FunCodec
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/voidful/codec-superb

Awesome Lists containing this project

README