Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucadellalib/discrete-wavlm-codec
A neural speech codec based on discrete WavLM representations
https://github.com/lucadellalib/discrete-wavlm-codec
clustering codec hifi-gan k-means neural-speech-coding pytorch quantization self-supervised-learning speech-synthesis token-extraction wavlm
Last synced: about 1 month ago
JSON representation
A neural speech codec based on discrete WavLM representations
- Host: GitHub
- URL: https://github.com/lucadellalib/discrete-wavlm-codec
- Owner: lucadellalib
- License: apache-2.0
- Created: 2024-08-28T00:54:11.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2024-08-28T04:23:32.000Z (3 months ago)
- Last Synced: 2024-09-27T20:03:30.949Z (about 2 months ago)
- Topics: clustering, codec, hifi-gan, k-means, neural-speech-coding, pytorch, quantization, self-supervised-learning, speech-synthesis, token-extraction, wavlm
- Language: Python
- Homepage: https://huggingface.co/lucadellalib/discrete-wavlm-codec
- Size: 403 KB
- Stars: 15
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Discrete WavLM Codec
A speech codec obtained by quantizing WavLM representations via K-means clustering (see https://arxiv.org/abs/2312.09747).
---------------------------------------------------------------------------------------------------------
## 🛠️️ Installation
First of all, install [Python 3.8 or later](https://www.python.org). Open a terminal and run:
```
pip install huggingface-hub safetensors speechbrain torch torchaudio transformers
```---------------------------------------------------------------------------------------------------------
## ▶️ Quickstart
We use `torch.hub` to make loading the model easy (no need to clone the repository):
```python
import torch
import torchaudiodwavlm = torch.hub.load("lucadellalib/discrete-wavlm-codec", "discrete_wavlm_large", layer_ids=[6])
dwavlm.eval().requires_grad_(False)
sig, sample_rate = torchaudio.load("")
sig = torchaudio.functional.resample(sig, sample_rate, dwavlm.sample_rate)
feats = dwavlm.sig_to_feats(sig)
toks = dwavlm.feats_to_toks(feats)
qfeats = dwavlm.toks_to_qfeats(toks)
rec_feats = dwavlm.qfeats_to_feats(qfeats)
rec_sig = dwavlm.feats_to_sig(rec_feats)
torchaudio.save("reconstruction.wav", rec_sig[:, 0], dwavlm.sample_rate)
```---------------------------------------------------------------------------------------------------------
## 📧 Contact
[[email protected]](mailto:[email protected])
---------------------------------------------------------------------------------------------------------