https://github.com/sony/sbctm
Schrödinger bridge consistency trajectory models for speech enhancement
https://github.com/sony/sbctm
Last synced: about 1 month ago
JSON representation
Schrödinger bridge consistency trajectory models for speech enhancement
- Host: GitHub
- URL: https://github.com/sony/sbctm
- Owner: sony
- License: other
- Created: 2025-05-09T04:02:57.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-07-17T03:18:26.000Z (3 months ago)
- Last Synced: 2025-07-30T23:51:21.742Z (3 months ago)
- Language: Python
- Homepage:
- Size: 104 KB
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement
## Description
This repository is the official PyTorch implementation of "Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement"
- [Paper](https://arxiv.org/abs/2507.11925) (link to arXiv, accepted in WASPAA 2025)
- [Pretrained model](https://osf.io/download/f293c/?view_only=e406b105dd274657b7b33cea9dc764af) (trained on the VoiceBank-DEMAND dataset downsampled to 16 kHz)
- [Demo](https://shuichiron.github.io/sbctm_demo.html) (Audio samples)Contact: Shuichiro.Nishigori@sony.com
## Installation
- Create a new virtual environment with Python 3.11.
- Install the package dependencies via `pip install -r requirements.txt`.
- Let pip resolve the dependencies for you. If you encounter any issues, please check `requirements_version.txt` for the exact versions we used.
- If using W&B logging (default):
- Set up a [wandb.ai](https://wandb.ai/) account
- Log in via `wandb login` before running our code.
- If not using W&B logging:
- Pass the option `--nolog` to `train.py`.
- Your logs will be stored as local CSVLogger logs in `lightning_logs/`.## Training
Training is performed using `train.py`, for example, with the following command:
```bash
python train.py --teacher_path --base_dir --batch_size --max_epochs
```
Main arguments:
- teacher_path: Path to the checkpoint as the teacher model
- base_dir: Path to the directory containing subdirectories `train/` and `valid/`, with the same filenames present in both (`.wav` files)
- batch_size: Batch size for taining (integer value)
- max_epochs: The number of epochs**Note:**
- In our paper, we additionally set the arguments for `--backbone ncsnpp-ctm_v2 --sde sbve --opt radam --ctm_lr 8e-5 --ctm_rec_w 1e-3 --ctm_psq_w 5e-4 --dsm_rec_w 1e-3 --dsm_psq_w 5e-4`, in addition to the ones mentioned above.
- If you encounter GPU memory issues, try using the `--nf ` option with a value less than 128.## Inference (Enhancement)
Inference is performed using `enhancement.py`, for example, with the following command:
```bash
python enhancement.py --test_dir --enhanced_dir --ckpt --N
```
Main arguments:
- test_dir: Path to the directory including noisy speech data
- enhanced_dir: Path to the directory for inference output
- ckpt: Path to the checkpoint to be used
- N: The number of reverse diffusion steps (integer value)## Citation
We kindly ask you to cite our papers in your publication when using any of our research or code:
```bib
@inproceedings{
sbctm2025,
author={S. Nishigori, K. Saito, N. Murata, M. Hirano, S. Takahashi, and Y. Mitsufuji},
title={{Schr\"odinger Bridge Consistency Trajectory Models for Speech Enhancement}},
year={2025},
booktitle={WASPAA 2025}
}
```## References
Part of the code is borrowed from the following repos. We would like to thank the authors of these repos for their contribution.
> https://github.com/sp-uhh/sgmse> https://github.com/sony/soundctm
## License
This repository is primarily licensed under the MIT License.
Some portions are derived from the work by Signal Processing (SP), Universität Hamburg.
Some files or components are derived from projects released under the Apache License 2.0.
See the `LICENSE` file for full details.