{"id":23725989,"url":"https://github.com/voidful/codec-superb","last_synced_at":"2025-09-04T02:31:21.896Z","repository":{"id":203343805,"uuid":"707193326","full_name":"voidful/Codec-SUPERB","owner":"voidful","description":"Audio Codec Speech processing Universal PERformance Benchmark","archived":false,"fork":false,"pushed_at":"2025-06-30T15:02:02.000Z","size":3454,"stargazers_count":258,"open_issues_count":4,"forks_count":25,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-06-30T15:47:49.314Z","etag":null,"topics":["audio","audio-codec","codec","speech","superb"],"latest_commit_sha":null,"homepage":"http://codecsuperb.eric-lam.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/voidful.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-19T12:06:14.000Z","updated_at":"2025-06-30T15:02:05.000Z","dependencies_parsed_at":"2023-11-16T17:27:38.294Z","dependency_job_id":"86d5b81c-03ef-4774-840f-219c55950e04","html_url":"https://github.com/voidful/Codec-SUPERB","commit_stats":null,"previous_names":["voidful/audiodecbenchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/voidful/Codec-SUPERB","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FCodec-SUPERB","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FCodec-SUPERB/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FCodec-SUPERB/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FCodec-SUPERB/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/voidful","download_url":"https://codeload.github.com/voidful/Codec-SUPERB/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FCodec-SUPERB/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273541900,"owners_count":25124056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","audio-codec","codec","speech","superb"],"created_at":"2024-12-31T00:18:06.814Z","updated_at":"2025-09-04T02:31:21.882Z","avatar_url":"https://github.com/voidful.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark\n\n![Overview](img/Overview.png)\n\nCodec-SUPERB is a comprehensive benchmark designed to evaluate audio codec models across a variety of speech tasks. Our\ngoal is to facilitate community collaboration and accelerate advancements in the field of speech processing by\npreserving and enhancing speech information quality.\n\n\u003ca href='https://codecsuperb.com/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e  \u003ca href='https://arxiv.org/abs/2402.13071'\u003e\u003cimg src='https://img.shields.io/badge/Paper-Arxiv-red'\u003e\u003c/a\u003e\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Key Features](#key-features)\n- [Batch Processing](#batch-processing)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Single Audio Processing](#single-audio-processing)\n  - [Batch Audio Processing](#batch-audio-processing)\n  - [Performance Comparison](#performance-comparison)\n- [Testing](#testing)\n- [Contribution](#contribution)\n- [License](#license)\n\n## Introduction\n\nCodec-SUPERB sets a new benchmark in evaluating sound codec models, providing a rigorous and transparent framework for\nassessing performance across a range of speech processing tasks. Our goal is to foster innovation and set new standards\nin audio quality and processing efficiency.\n\n## Key Features\n\n### Out-of-the-Box Codec Interface\n\nCodec-SUPERB offers an intuitive, out-of-the-box codec interface that allows for easy integration and testing of various\ncodec models, facilitating quick iterations and experiments.\n\n### Multi-Perspective Leaderboard\n\nCodec-SUPERB's unique blend of multi-perspective evaluation and an online leaderboard drives innovation in sound codec\nresearch by providing a comprehensive assessment and fostering competitive transparency among developers.\n\n### Standardized Environment\n\nWe ensure a standardized testing environment to guarantee fair and consistent comparison across all models. This\nuniformity brings reliability to benchmark results, making them universally interpretable.\n\n### Unified Datasets\n\nWe provide a collection of unified datasets, curated to test a wide range of speech processing scenarios. This ensures\nthat models are evaluated under diverse conditions, reflecting real-world applications.\n\n## Batch Processing\n\n**🚀 NEW: Efficient Batch Processing Support**\n\nCodec-SUPERB now supports efficient batch processing for encoding and decoding multiple audio samples simultaneously, eliminating the need for for loops and providing significant performance improvements.\n\n### ✅ Key Benefits\n\n- **3-5x faster processing** for multiple audio samples\n- **GPU optimization** through vectorized operations\n- **Automatic padding** for variable-length audio samples\n- **Memory efficient** batch operations\n- **Backward compatible** - existing code continues to work\n\n### ✅ Supported Operations\n\n- `batch_extract_unit()`: Extract units from multiple audio samples at once\n- `batch_decode_unit()`: Decode multiple units back to audio at once  \n- `batch_synth()`: Complete synthesis pipeline for multiple samples\n\n### ✅ All Codecs Supported\n\nEvery codec in Codec-SUPERB includes optimized batch processing:\n\n- **EnCodec** (all variants): True tensor batching with automatic padding\n- **SpeechTokenizer**: RVQ-aware batch processing  \n- **AudioDec**: Quantizer-optimized batch operations\n- **HuggingFace EnCodec**: Native transformer batch processing\n- **Descript Audio Codec**: Batch compression/decompression\n- **SQCodec**: Feature-aware batch encoding\n- **FunCodec**: AudioSignal batch handling\n- **WavTokenizer**: Bandwidth-aware batch processing\n- **AcademicCodec**: Acoustic token batch generation\n\n## Installation\n\n```bash\ngit clone https://github.com/voidful/Codec-SUPERB.git\ncd Codec-SUPERB\npip install -r requirements.txt\n```\n\n## Usage\n\n### [Leaderboard](https://codecsuperb.com)\n\n### Single Audio Processing\n\nTraditional single audio processing (still fully supported):\n\n```python\nfrom SoundCodec import codec\nimport torchaudio\n\n# get all available codec\nprint(codec.list_codec())\n# load codec by name, use encodec as example\nencodec_24k_6bps = codec.load_codec('encodec_24k_6bps')\n\n# load audio\nwaveform, sample_rate = torchaudio.load('sample_audio.wav')\nresampled_waveform = waveform.numpy()[-1]\ndata_item = {'audio': {'array': resampled_waveform,\n                       'sampling_rate': sample_rate}}\n\n# extract unit\nsound_unit = encodec_24k_6bps.extract_unit(data_item).unit\n\n# sound synthesis\ndecoded_waveform = encodec_24k_6bps.synth(data_item, local_save=False)['audio']['array']\n```\n\n### Batch Audio Processing\n\n**🚀 NEW: Process multiple audio samples efficiently:**\n\n```python\nfrom SoundCodec import codec\nimport torchaudio\n\n# load codec\nencodec_24k_6bps = codec.load_codec('encodec_24k_6bps')\n\n# prepare multiple audio samples\naudio_files = ['audio1.wav', 'audio2.wav', 'audio3.wav']\ndata_list = []\n\nfor audio_file in audio_files:\n    waveform, sample_rate = torchaudio.load(audio_file)\n    data_item = {\n        'id': audio_file,\n        'audio': {\n            'array': waveform.numpy()[0],  # take first channel\n            'sampling_rate': sample_rate\n        }\n    }\n    data_list.append(data_item)\n\n# OPTION 1: Batch extraction and decoding (recommended)\nbatch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)\nprint(f\"Extracted {batch_extracted.batch_size} samples\")\nprint(f\"Unit shapes: {[unit.shape for unit in batch_extracted.units]}\")\n\nbatch_decoded = encodec_24k_6bps.batch_decode_unit(batch_extracted)\nprint(f\"Decoded audio shapes: {[audio.shape for audio in batch_decoded]}\")\n\n# OPTION 2: Complete batch synthesis pipeline\nresults = encodec_24k_6bps.batch_synth(data_list, local_save=False)\nfor i, result in enumerate(results):\n    print(f\"Sample {i}: unit shape {result['unit'].shape}, \"\n          f\"audio shape {result['audio']['array'].shape}\")\n```\n\n### Performance Comparison\n\nCompare single vs batch processing performance:\n\n```python\nimport time\n\n# Single processing (old approach)\nstart_time = time.time()\nsingle_results = []\nfor data in data_list:\n    extracted = encodec_24k_6bps.extract_unit(data)\n    decoded = encodec_24k_6bps.decode_unit(extracted.stuff_for_synth)\n    single_results.append(decoded)\nsingle_time = time.time() - start_time\n\n# Batch processing (new approach)  \nstart_time = time.time()\nbatch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)\nbatch_results = encodec_24k_6bps.batch_decode_unit(batch_extracted)\nbatch_time = time.time() - start_time\n\nprint(f\"Single processing: {single_time:.3f}s\")\nprint(f\"Batch processing: {batch_time:.3f}s\") \nprint(f\"Speedup: {single_time/batch_time:.2f}x\")\n```\n\n### Advanced Batch Processing Tips\n\n**Group samples by length for optimal performance:**\n\n```python\n# Group samples by similar lengths\nshort_samples = [data for data in data_list if len(data['audio']['array']) \u003c 48000]\nlong_samples = [data for data in data_list if len(data['audio']['array']) \u003e= 48000]\n\n# Process each group separately for better efficiency\nif short_samples:\n    short_results = encodec_24k_6bps.batch_extract_unit(short_samples)\nif long_samples:\n    long_results = encodec_24k_6bps.batch_extract_unit(long_samples)\n```\n\n**Process large datasets in chunks:**\n\n```python\ndef process_large_dataset(codec, data_list, batch_size=8):\n    all_results = []\n    for i in range(0, len(data_list), batch_size):\n        batch = data_list[i:i+batch_size]\n        batch_results = codec.batch_synth(batch, local_save=False)\n        all_results.extend(batch_results)\n    return all_results\n\n# Process large dataset efficiently\nlarge_results = process_large_dataset(encodec_24k_6bps, large_data_list)\n```\n\n## Testing\n\nRun the test suite to verify codec functionality:\n\n```bash\n# Run all tests\npython -m pytest SoundCodec/test/\n\n# Run batch processing tests specifically\npython -m pytest SoundCodec/test/test_batch_processing.py -v\n\n# Run performance benchmarks\npython SoundCodec/test/benchmark_batch_performance.py\n```\n\n## Citation\nIf you use this code or result in your paper, please cite our work as:\n```Tex\n@article{wu2024codec,\n  title={Codec-superb: An in-depth analysis of sound codec models},\n  author={Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander H and Lee, Hung-yi},\n  journal={arXiv preprint arXiv:2402.13071},\n  year={2024}\n}\n```\n```Tex\n@article{wu2024towards,\n  title={Towards audio language modeling-an overview},\n  author={Wu, Haibin and Chen, Xuanjun and Lin, Yi-Cheng and Chang, Kai-wei and Chung, Ho-Lam and Liu, Alexander H and Lee, Hung-yi},\n  journal={arXiv preprint arXiv:2402.13236},\n  year={2024}\n}\n```\n```Tex\n@inproceedings{wu-etal-2024-codec,\n    title = \"Codec-{SUPERB}: An In-Depth Analysis of Sound Codec Models\",\n    author = \"Wu, Haibin  and\n      Chung, Ho-Lam  and\n      Lin, Yi-Cheng  and\n      Wu, Yuan-Kuei  and\n      Chen, Xuanjun  and\n      Pai, Yu-Chi  and\n      Wang, Hsiu-Hsuan  and\n      Chang, Kai-Wei  and\n      Liu, Alexander  and\n      Lee, Hung-yi\",\n    editor = \"Ku, Lun-Wei  and\n      Martins, Andre  and\n      Srikumar, Vivek\",\n    booktitle = \"Findings of the Association for Computational Linguistics: ACL 2024\",\n    month = aug,\n    year = \"2024\",\n    address = \"Bangkok, Thailand\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.findings-acl.616\",\n    doi = \"10.18653/v1/2024.findings-acl.616\",\n    pages = \"10330--10348\",\n}\n```\n## Contribution\n\nContributions are highly encouraged, whether it's through adding new codec models, expanding the dataset collection, or\nenhancing the benchmarking framework. Please see `CONTRIBUTING.md` for more details.\n\n## License\n\nThis project is licensed under the MIT License - see the `LICENSE` file for details.\n\n## Reference Sound Codec Repositories：\n\n- https://github.com/ZhangXInFD/SpeechTokenizer\n- https://github.com/descriptinc/descript-audio-codec\n- https://github.com/facebookresearch/encodec\n- https://github.com/yangdongchao/AcademiCodec\n- https://github.com/facebookresearch/AudioDec\n- https://github.com/alibaba-damo-academy/FunCodec\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fcodec-superb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvoidful%2Fcodec-superb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fcodec-superb/lists"}