https://github.com/haoheliu/semanticodec-inference
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
https://github.com/haoheliu/semanticodec-inference
Last synced: about 2 months ago
JSON representation
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
- Host: GitHub
- URL: https://github.com/haoheliu/semanticodec-inference
- Owner: haoheliu
- License: mit
- Created: 2024-05-04T16:56:43.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-07T20:52:21.000Z (3 months ago)
- Last Synced: 2025-04-09T19:16:27.707Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 1.97 MB
- Stars: 194
- Watchers: 5
- Forks: 15
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://arxiv.org/abs/2405.00233) [](https://haoheliu.github.io/SemantiCodec/)
# SemantiCodec
Ultra-low bitrate neural audio codec with a better semantic in the latent space.**Highlight**
- Bitrate: 0.31 kbps - 1.40 kbps
- Token rate: 25, 50, or 100 per second
- cpu, cuda, and mps are supported# Usage
## Installation
```bash
pip install git+https://github.com/haoheliu/SemantiCodec-inference.git
```## Encoding and decoding
**Checkpoints will be automatically downloaded when you initialize the SemantiCodec with the following code.**
```python
from semanticodec import SemantiCodecsemanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=16384)
filepath = "test/test.wav" # audio with arbitrary length
tokens = semanticodec.encode(filepath)
waveform = semanticodec.decode(tokens)# Save the reconstruction file
import soundfile as sf
sf.write("output.wav", waveform[0,0], 16000)
```## Other Settings
```python
from semanticodec import SemantiCodec###############Choose one of the following######################
semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=32768) # 1.40 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=32768) # 0.70 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=32768) # 0.35 kbpssemanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=16384) # 1.35 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=16384) # 0.68 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=16384) # 0.34 kbpssemanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=8192) # 1.30 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=8192) # 0.65 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=8192) # 0.33 kbpssemanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=4096) # 1.25 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=4096) # 0.63 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=4096) # 0.31 kbps
#####################################filepath = "test/test.wav"
tokens = semanticodec.encode(filepath)
waveform = semanticodec.decode(tokens)import soundfile as sf
sf.write("output.wav", waveform[0,0], 16000)
```If you are interested in reusing the same evaluation pipeline and data in the paper, please refer to this [zenodo repo](https://zenodo.org/records/11047204).
## Citation
If you find this repo helpful, please consider citing in the following format:```bibtex
@ARTICLE{semanticodec2024,
author={Liu, Haohe and Xu, Xuenan and Yuan, Yi and Wu, Mengyue and Wang, Wenwu and Plumbley, Mark D.},
journal={IEEE Journal of Selected Topics in Signal Processing},
title={SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound},
year={2024},
volume={18},
number={8},
pages={1448-1461},
doi={10.1109/JSTSP.2024.3506286}
}
```