Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Speech-Enhancement
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
https://github.com/nanahou/Awesome-Speech-Enhancement
Last synced: 3 days ago
JSON representation
-
Tools
-
Coming soon...
- SETK
- pyAudioAnalysis
- LPS - power-spectrum/magnitude spectrum/log-magnitude spectrum/Cepstral mean and variance normalization. |
- MFCC
- pyroomacoustics
- gpuRIR
- rir_simulator_python
- audiomentations
- Beamformer - based adaptive beamformer (MVDR, GEVD, MCWF). |
- Time-frequency Mask - frequency mask (PSM, IRM, IBM, IAM, ...) as the neural network training labels. |
- SSL
- Data format
- Data simulation
- RIR simulation
- SDR - to-distortion-ratio. |
-
-
Publications
-
Overview
- Nana Hou - ge-a79787197/), [Hao Shi](https://www.linkedin.com/in/hao-shi-29300b1b2/) (Tianjin University), [Chenglin Xu](https://www.linkedin.com/in/xuchenglin28/) (National University of Singapore), Chen Weiguang (Hunan University).
-
SOTA results
-
Coming soon...
- MDPhD - - |
- T-GSA - -|
- CNN-GAN - - | 0.93 |
- WaveUnet - - |
- WaveNet - - | 3.62 | 3.24 | 2.98 | -- | -- |
- U-net - - |
- TasNet - - |
- RHRnet
- dataset by University of Edinburgh - domain and "T" is time-domain.)
- WaveUnet - - |
- WaveNet - - | 3.62 | 3.24 | 2.98 | -- | -- |
- TasNet - - |
- RHRnet
- DFL - - | 3.86 | 3.33 | 3.22 | -- | -- |
- SDR-PRSQ
- T-GSA - -|
- SEGAN
- DFL - - | 3.86 | 3.33 | 3.22 | -- | -- |
- SEGAN
- MDPhD - - |
- Complex U-net - - |
- SDR-PRSQ
- MSE-GAN - - | 0.93 |
-
-
Learning materials
-
Datasets
-
Coming soon...
- TIMIT
- VCTK
- WSJ0 - - | 149 | English | $1500 | The WSJ database was generated from a machine-readable corpus of Wall Street Journal news text. |
- REVERB - | 8K+ | English | Free | This corpus is from REVERB 2014 chanllenge. The challenge assumes the scenario of capturing utterances spoken by a single stationary distant-talking speaker with 1-channel (1ch), 2-channel (2ch) or 8-channel (8ch) microphone-arrays in reverberant meeting rooms. It features both real recordings and simulated data, a part of which simulates the real recordings. |
- LibriSpeech - scale (1000 hours) corpus of read English speech. |
- CHiME series - - | -- | English | Free | The database is published by CHiME Speech Separation and Recognition Challenge. |
- DEMAND - world noise in a variety of settings. |
- 115 Noise - N100 noises, they were collected by Guoning Hu and the other 15 home-made noise types by USTC.|
- NoiseX-92
- RIR_Noises - | Free | A database of simulated and real room impulse responses, isotropic and point-source noises. The audio files in this data are all in 16k sampling rate and 16-bit precision.This data includes all the room impulse responses (RIRs) and noises we used in our paper "A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition" submitted to ICASSP 2017. It includes the real RIRs and isotropic noises from the RWCP sound scene database, the 2014 REVERB challenge database and the Aachen impulse response database (AIR); the simulated RIRs generated by ourselves and also the point-source noises that extracted from the MUSAN corpus. |
- REVERB - | 8K+ | English | Free | This corpus is from REVERB 2014 chanllenge. The challenge assumes the scenario of capturing utterances spoken by a single stationary distant-talking speaker with 1-channel (1ch), 2-channel (2ch) or 8-channel (8ch) microphone-arrays in reverberant meeting rooms. It features both real recordings and simulated data, a part of which simulates the real recordings. |
-
-
Applications
-
Coming soon...
- [download - vHSLKAr3ywUQrX-ERFrSplpZun728UaRuhh4DchW8)
-
Categories
Sub Categories
Keywords
acoustics
3
python
3
audio
3
beamforming
2
image-source-model
2
room-impulse-response
2
signal-processing
2
machine-learning
2
adaptive-filtering
1
pyaudioanalysis
1
audio-data
1
audio-analysis-tasks
1
time-frequency-masking
1
speech-separation
1
speech-enhancement
1
speech
1
rir-generator
1
kaldi
1
doa
1
stft
1
gpu-acceleration
1
python-library
1
rir
1
room-impulse-responses
1
audio-data-augmentation
1
audio-effects
1
augmentation
1
data-augmentation
1
deep-learning
1
dsp
1
music
1
sound
1
sound-processing
1