https://github.com/funcwj/setk

Tools for Speech Enhancement integrated with Kaldi
https://github.com/funcwj/setk

beamforming kaldi rir-generator speech speech-enhancement speech-separation time-frequency-masking

Last synced: 9 months ago
JSON representation

Tools for Speech Enhancement integrated with Kaldi

Host: GitHub
URL: https://github.com/funcwj/setk
Owner: funcwj
License: apache-2.0
Created: 2018-03-04T11:24:40.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2023-07-06T22:59:55.000Z (over 2 years ago)
Last Synced: 2024-11-02T20:31:58.358Z (about 1 year ago)
Topics: beamforming, kaldi, rir-generator, speech, speech-enhancement, speech-separation, time-frequency-masking
Language: Python
Homepage:
Size: 36.3 MB
Stars: 396
Watchers: 22
Forks: 92
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Speech-Enhancement - SETK
awesome-speech-enhancement - [Code

README

          ## SETK: Speech Enhancement Tools integrated with Kaldi

Here are some speech enhancement/separation tools integrated with [Kaldi](https://github.com/kaldi-asr/kaldi). I use them for front-end's data processing.

### Python Scripts

* Supervised (mask-based) adaptive beamformer (GEVD/MVDR/MCWF...)

* Data convertion among MATLAB, Numpy and Kaldi

* Data visualization (TF-mask, spatial/spectral features, beam pattern...)

* Unified data and IO handlers for Kaldi's scripts, archives, wave and numpy's ndarray...

* Unsupervised mask estimation (CGMM/CACGMM)

* Spatial/Spectral feature computation

* DS (delay and sum) beamformer, SD (supper-directive) beamformer

* AuxIVA, WPE & WPD, FB (Fixed Beamformer)

* Mask computation (iam, irm, ibm, psm, crm)

* RIR simulation (1D/2D arrays)

* Single channel speech separation (TF spectral masking)

* Si-SDR/SDR/WER evaluation

* Pywebrtc vad wrapper

* Mask-based source localization

* Noise suppression

* Data simulation

* ...

Please check out the following instruction for usage of the scripts.

* [Adaptive Beamformer](doc/adaptive_beamformer)

* [Fixed Beamformer](doc/fixed_beamformer)

* [Sound Source Localization](doc/ssl)

* [Spectral Feature](doc/spectral_feature)

* [Spatial Feature](doc/spatial_feature)

* [VAD](doc/vad)

* [Noise Suppression](doc/ns)

* [Steer Vector](doc/steer_vector)

* [Room Impluse Response](doc/rir)

* [Spatial Clustering](doc/spatial_clustering)

* [WPE & WPD](doc/wpe)

* [Time-frequency Mask](doc/tf_mask)

* [Format Transform](doc/format_transform)

* [Data Simulation](doc/data_simu)

### Kaldi Commands

* Compute time-frequency masks (ibm, irm etc)

* Compute phase & magnitude spectrogram & complex STFT

* Seperate target component using input masks

* Wave reconstruction from enhanced spectral features

* Complex matrix/vector class

* MVDR/GEVD beamformer (depend on T-F mask, not very stable)

* Fixed beamformer

* Compute angular spectrogram based on SRP-PHAT

* RIR generator (reference from [RIR-Generator](https://github.com/ehabets/RIR-Generator))

To build the sources, you need to compile [Kaldi](https://github.com/kaldi-asr/kaldi) with `--shared` flags and patch `matrix/matrix-common.h` first

```c++

typedef enum {

    kTrans          = 112,  // CblasTrans

    kNoTrans        = 111,  // CblasNoTrans

    kConjTrans      = 113,  // CblasConjTrans

    kConjNoTrans    = 114   // CblasConjNoTrans

} MatrixTransposeType;

```

Then run

```bash

mkdir build

cd build

export KALDI_ROOT=/path/to/kaldi/root

export OPENFST_ROOT=/path/to/openfst/root

# if on UNIX, need compile kaldi with openblas

export OPENBLAS_ROOT=/path/to/openblas/root

cmake ..

make -j

```

***Now I mainly work on [sptk](scripts) package, development based on kaldi is stopped.***

For developers (who want to make commits or PRs), please remember to setup [pre-commit](https://pre-commit.com) for code style formating.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/funcwj/setk

Awesome Lists containing this project

README