https://github.com/funcwj/deep-clustering

deep clustering method for single-channel speech separation
https://github.com/funcwj/deep-clustering

pytorch speech-separation

Last synced: 8 months ago
JSON representation

deep clustering method for single-channel speech separation

Host: GitHub
URL: https://github.com/funcwj/deep-clustering
Owner: funcwj
Created: 2018-06-14T15:50:16.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-06-21T21:19:45.000Z (over 3 years ago)
Last Synced: 2024-11-02T19:33:47.678Z (about 1 year ago)
Topics: pytorch, speech-separation
Language: Python
Size: 23.4 KB
Stars: 108
Watchers: 6
Forks: 34
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-speech-enhancement - [Code

README

## Deep clustering for single-channel speech separation

Implement of "Deep Clustering Discriminative Embeddings for Segmentation and Separation"

### Requirements

see [requirements.txt](requirements.txt)

### Usage

1. Configure experiments in .yaml files, for example: `train.yaml`

2. Training:

```shell
python ./train_dcnet.py --config conf/train.yaml --num-epoches 20 > train.log 2>&1 &
```

3. Inference:
```
python ./separate.py --num-spks 2 $mdl_dir/train.yaml $mdl_dir/final.pkl egs.scp
```

### Experiments

| Configure | Epoch | FM | FF | MM | FF/MM | AVG |
| :-------: | :---: | :---: | :--: | :--: | :---: | :--: |
| [config-1](conf/1.config.yaml) | 25 | 11.42 | 6.85 | 7.88 | 7.36 | 9.54 |

### Q & A

1. The format of the `.scp` file?

The format of the `wav.scp` file follows the definition in kaldi toolkit. Each line contains a `key value` pair, where key is a unique string to index audio file and the value is the path of the file. For example
```
mix-utt-00001 /home/data/train/mix-utt-00001.wav
...
mix-utt-XXXXX /home/data/train/mix-utt-XXXXX.wav
```

2. How to prepare training dataset?

Original paper use MATLAB scripts from [create-speaker-mixtures.zip](http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip) to simulate two- and three-speaker dataset. You can use you own data source (egs: Librispeech, TIMIT) and create mixtures, keeping clean sources at meanwhile.

### Reference

1. Hershey J R, Chen Z, Le Roux J, et al. Deep clustering: Discriminative embeddings for segmentation and separation[C]//Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016: 31-35.
2. Isik Y, Roux J L, Chen Z, et al. Single-channel multi-speaker separation using deep clustering[J]. arXiv preprint arXiv:1607.02173, 2016.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/funcwj/deep-clustering

Awesome Lists containing this project

README