https://github.com/lucadellalib/discrete-wavlm-codec

A neural speech codec based on discrete WavLM representations
https://github.com/lucadellalib/discrete-wavlm-codec

clustering codec hifi-gan k-means neural-speech-coding pytorch quantization self-supervised-learning speech-synthesis token-extraction wavlm

Last synced: 5 months ago
JSON representation

A neural speech codec based on discrete WavLM representations

Host: GitHub
URL: https://github.com/lucadellalib/discrete-wavlm-codec
Owner: lucadellalib
License: apache-2.0
Created: 2024-08-28T00:54:11.000Z (10 months ago)
Default Branch: master
Last Pushed: 2024-08-28T04:23:32.000Z (10 months ago)
Last Synced: 2024-09-27T20:03:30.949Z (9 months ago)
Topics: clustering, codec, hifi-gan, k-means, neural-speech-coding, pytorch, quantization, self-supervised-learning, speech-synthesis, token-extraction, wavlm
Language: Python
Homepage: https://huggingface.co/lucadellalib/discrete-wavlm-codec
Size: 403 KB
Stars: 15
Watchers: 3
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Discrete WavLM Codec

A speech codec obtained by quantizing WavLM representations via K-means clustering (see https://arxiv.org/abs/2312.09747).

---------------------------------------------------------------------------------------------------------

## 🛠️️ Installation

First of all, install [Python 3.8 or later](https://www.python.org). Open a terminal and run:

```

pip install huggingface-hub safetensors speechbrain torch torchaudio transformers

```

---------------------------------------------------------------------------------------------------------

## ▶️ Quickstart

We use `torch.hub` to make loading the model easy (no need to clone the repository):

```python

import torch

import torchaudio

dwavlm = torch.hub.load("lucadellalib/discrete-wavlm-codec", "discrete_wavlm_large", layer_ids=[6])

dwavlm.eval().requires_grad_(False)

sig, sample_rate = torchaudio.load("")

sig = torchaudio.functional.resample(sig, sample_rate, dwavlm.sample_rate)

feats = dwavlm.sig_to_feats(sig)

toks = dwavlm.feats_to_toks(feats)

qfeats = dwavlm.toks_to_qfeats(toks)

rec_feats = dwavlm.qfeats_to_feats(qfeats)

rec_sig = dwavlm.feats_to_sig(rec_feats)

torchaudio.save("reconstruction.wav", rec_sig[:, 0], dwavlm.sample_rate)

```

---------------------------------------------------------------------------------------------------------

## 📧 Contact

[[email protected]](mailto:[email protected])

---------------------------------------------------------------------------------------------------------

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucadellalib/discrete-wavlm-codec

Awesome Lists containing this project

README