Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/funcwj/setk
Tools for Speech Enhancement integrated with Kaldi
https://github.com/funcwj/setk
beamforming kaldi rir-generator speech speech-enhancement speech-separation time-frequency-masking
Last synced: 3 months ago
JSON representation
Tools for Speech Enhancement integrated with Kaldi
- Host: GitHub
- URL: https://github.com/funcwj/setk
- Owner: funcwj
- License: apache-2.0
- Created: 2018-03-04T11:24:40.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2023-07-06T22:59:55.000Z (over 1 year ago)
- Last Synced: 2024-08-02T07:18:36.662Z (6 months ago)
- Topics: beamforming, kaldi, rir-generator, speech, speech-enhancement, speech-separation, time-frequency-masking
- Language: Python
- Homepage:
- Size: 36.3 MB
- Stars: 392
- Watchers: 22
- Forks: 92
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Speech-Enhancement - SETK
- awesome-speech-enhancement - [Code
README
## SETK: Speech Enhancement Tools integrated with Kaldi
Here are some speech enhancement/separation tools integrated with [Kaldi](https://github.com/kaldi-asr/kaldi). I use them for front-end's data processing.
### Python Scripts
* Supervised (mask-based) adaptive beamformer (GEVD/MVDR/MCWF...)
* Data convertion among MATLAB, Numpy and Kaldi
* Data visualization (TF-mask, spatial/spectral features, beam pattern...)
* Unified data and IO handlers for Kaldi's scripts, archives, wave and numpy's ndarray...
* Unsupervised mask estimation (CGMM/CACGMM)
* Spatial/Spectral feature computation
* DS (delay and sum) beamformer, SD (supper-directive) beamformer
* AuxIVA, WPE & WPD, FB (Fixed Beamformer)
* Mask computation (iam, irm, ibm, psm, crm)
* RIR simulation (1D/2D arrays)
* Single channel speech separation (TF spectral masking)
* Si-SDR/SDR/WER evaluation
* Pywebrtc vad wrapper
* Mask-based source localization
* Noise suppression
* Data simulation
* ...Please check out the following instruction for usage of the scripts.
* [Adaptive Beamformer](doc/adaptive_beamformer)
* [Fixed Beamformer](doc/fixed_beamformer)
* [Sound Source Localization](doc/ssl)
* [Spectral Feature](doc/spectral_feature)
* [Spatial Feature](doc/spatial_feature)
* [VAD](doc/vad)
* [Noise Suppression](doc/ns)
* [Steer Vector](doc/steer_vector)
* [Room Impluse Response](doc/rir)
* [Spatial Clustering](doc/spatial_clustering)
* [WPE & WPD](doc/wpe)
* [Time-frequency Mask](doc/tf_mask)
* [Format Transform](doc/format_transform)
* [Data Simulation](doc/data_simu)### Kaldi Commands
* Compute time-frequency masks (ibm, irm etc)
* Compute phase & magnitude spectrogram & complex STFT
* Seperate target component using input masks
* Wave reconstruction from enhanced spectral features
* Complex matrix/vector class
* MVDR/GEVD beamformer (depend on T-F mask, not very stable)
* Fixed beamformer
* Compute angular spectrogram based on SRP-PHAT
* RIR generator (reference from [RIR-Generator](https://github.com/ehabets/RIR-Generator))To build the sources, you need to compile [Kaldi](https://github.com/kaldi-asr/kaldi) with `--shared` flags and patch `matrix/matrix-common.h` first
```c++
typedef enum {
kTrans = 112, // CblasTrans
kNoTrans = 111, // CblasNoTrans
kConjTrans = 113, // CblasConjTrans
kConjNoTrans = 114 // CblasConjNoTrans
} MatrixTransposeType;
```Then run
```bash
mkdir build
cd build
export KALDI_ROOT=/path/to/kaldi/root
export OPENFST_ROOT=/path/to/openfst/root
# if on UNIX, need compile kaldi with openblas
export OPENBLAS_ROOT=/path/to/openblas/root
cmake ..
make -j
```***Now I mainly work on [sptk](scripts) package, development based on kaldi is stopped.***
For developers (who want to make commits or PRs), please remember to setup [pre-commit](https://pre-commit.com) for code style formating.