https://github.com/lucadellalib/focalcodec
A low-bitrate single-codebook 16 kHz speech codec based on focal modulation
https://github.com/lucadellalib/focalcodec
codec deep-learning focal-modulation neural-speech-coding pytorch speech-synthesis vector-quantization vocos wavlm
Last synced: 3 months ago
JSON representation
A low-bitrate single-codebook 16 kHz speech codec based on focal modulation
- Host: GitHub
- URL: https://github.com/lucadellalib/focalcodec
- Owner: lucadellalib
- License: apache-2.0
- Created: 2025-02-03T09:03:40.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-10T01:44:40.000Z (8 months ago)
- Last Synced: 2025-02-10T02:29:15.410Z (8 months ago)
- Topics: codec, deep-learning, focal-modulation, neural-speech-coding, pytorch, speech-synthesis, vector-quantization, vocos, wavlm
- Language: Python
- Homepage: https://lucadellalib.github.io/focalcodec-web/
- Size: 7.16 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ⚡ FocalCodec

A low-bitrate single-codebook 16 kHz speech codec based on [focal modulation](https://arxiv.org/abs/2203.11926).
- 📜 **Preprint**: https://arxiv.org/abs/2502.04465
- 🌐 **Project Page**: https://lucadellalib.github.io/focalcodec-web/
- 🔊 **Downstream Tasks**: https://github.com/lucadellalib/audiocodecs
---------------------------------------------------------------------------------------------------------
## 📌 Available Checkpoints
| Checkpoint | Token Rate (Hz) | Bitrate (kbps) | Dataset |
|:---------------------------------------------------------------------------------------:|:---------------:|:--------------:|:-----------:|
| [lucadellalib/focalcodec_50hz](https://huggingface.co/lucadellalib/focalcodec_50hz) | 50.0 | 0.65 | LibriTTS960 |
| [lucadellalib/focalcodec_25hz](https://huggingface.co/lucadellalib/focalcodec_25hz) | 25.0 | 0.33 | LibriTTS960 |
| [lucadellalib/focalcodec_12_5hz](https://huggingface.co/lucadellalib/focalcodec_12_5hz) | 12.5 | 0.16 | LibriTTS960 |---------------------------------------------------------------------------------------------------------
## 🛠️️ Installation
First of all, install [Python 3.8 or later](https://www.python.org). Then, open a terminal and run:
```
pip install huggingface-hub safetensors soundfile torch torchaudio
```---------------------------------------------------------------------------------------------------------
## ▶️ Quickstart
**NOTE**: the `audio-samples` directory contains audio samples that you can download and use to test the codec.
You can easily load the model using `torch.hub` without cloning the repository:
```python
import torch
import torchaudio# Load FocalCodec model
config = "lucadellalib/focalcodec_50hz"
codec = torch.hub.load(
"lucadellalib/focalcodec", "focalcodec", config=config, force_reload=True
)
codec.eval().requires_grad_(False)# Load and preprocess the input audio
audio_file = "audio-samples/librispeech-dev-clean/251-118436-0003.wav"
sig, sample_rate = torchaudio.load(audio_file)
sig = torchaudio.functional.resample(sig, sample_rate, codec.sample_rate)# Encode audio into tokens
toks = codec.sig_to_toks(sig) # Shape: (batch, time)
print(toks.shape)
print(toks)# Convert tokens to their corresponding binary spherical codes
codes = codec.toks_to_codes(toks) # Shape: (batch, time, log2 codebook_size)
print(codes.shape)
print(codes)# Decode tokens back into a waveform
rec_sig = codec.toks_to_sig(toks)# Save the reconstructed audio
rec_sig = torchaudio.functional.resample(rec_sig, codec.sample_rate, sample_rate)
torchaudio.save("reconstruction.wav", rec_sig, sample_rate)
```Alternatively, you can install FocalCodec as a standard Python package using `pip`:
```bash
pip install focalcodec@git+https://github.com/lucadellalib/focalcodec.git@main#egg=focalcodec
```Once installed, you can import it in your scripts:
```python
import focalcodecconfig = "lucadellalib/focalcodec_50hz"
codec = focalcodec.FocalCodec.from_pretrained(config)
```Check the code documentation for more details on model usage and available configurations.
---------------------------------------------------------------------------------------------------------
## 🎤 Running the Demo Script
Clone or download and extract the repository, navigate to ``, open a terminal and run:
**Speech Resynthesis**
```bash
python demo.py \
--input_file audio-samples/librispeech-dev-clean/251-118436-0003.wav \
--output_file reconstruction.wav
```**Voice Conversion**
```bash
python demo.py \
--input_file audio-samples/librispeech-dev-clean/251-118436-0003.wav \
--output_file reconstruction.wav \
--reference_files audio-samples/librispeech-dev-clean/84
```---------------------------------------------------------------------------------------------------------
## @ Citing
```
@article{dellalibera2025focalcodec,
title = {{FocalCodec}: Low-Bitrate Speech Coding via Focal Modulation Networks},
author = {Luca {Della Libera} and Francesco Paissan and Cem Subakan and Mirco Ravanelli},
journal = {arXiv preprint arXiv:2502.04465},
year = {2025},
}
```---------------------------------------------------------------------------------------------------------
## 📧 Contact
[luca.dellalib@gmail.com](mailto:luca.dellalib@gmail.com)
---------------------------------------------------------------------------------------------------------