https://github.com/wenet-e2e/nn-singal-processing-papers
List of NN based singal processing papers
https://github.com/wenet-e2e/nn-singal-processing-papers
Last synced: 4 months ago
JSON representation
List of NN based singal processing papers
- Host: GitHub
- URL: https://github.com/wenet-e2e/nn-singal-processing-papers
- Owner: wenet-e2e
- License: apache-2.0
- Created: 2023-06-02T14:06:22.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-05T05:48:05.000Z (about 3 years ago)
- Last Synced: 2025-10-23T04:56:03.147Z (8 months ago)
- Size: 8.79 KB
- Stars: 21
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# nn-singal-processing-papers
List of NN based singal processing papers
## Adaptive Noise Suppression (Speech Enhancement)
### Time-Frequency Domain
- DCUnet: [Phase-aware speech enhancement with Deep Complex U-Net](https://openreview.net/pdf?id=SkeRTsAcYm) (SNU, ICLR, 2019)
- DCCRN
- DCCRN: [DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement](https://arxiv.org/pdf/2008.00264.pdf) (NWPU, 2020)
- DCCRN+: [DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement](https://arxiv.org/pdf/2106.08672.pdf) (NWPU, 2021)
- S-DCCRN: [S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement](https://arxiv.org/pdf/2111.08387.pdf) (NWPU, 2022)
- Spatial-DCCRN: [SPATIAL-DCCRN: DCCRN EQUIPPED WITH FRAME-LEVEL ANGLE FEATURE AND HYBRID FILTERING FOR MULTI-CHANNEL SPEECH ENHANCEMENT](https://arxiv.org/pdf/2210.08802.pdf) (NWPU, 2022)
- DesNet: [DESNET: A MULTI-CHANNEL NETWORK FOR SIMULTANEOUS SPEECH DEREVERBERATION, ENHANCEMENT AND SEPARATION](https://arxiv.org/pdf/2011.02131.pdf) (NWPU, 2020)
- PHASEN: [PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network](https://arxiv.org/pdf/1911.04697.pdf) (USTC, AAAI, 2020)
- DPCRN: [DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement](https://arxiv.org/pdf/2107.05429.pdf) (NJU, Interspeech, 2021)
- BSRNN: [High Fidelity Speech Enhancement with Band-split RNN](https://arxiv.org/pdf/2212.00406.pdf)(Tencent, 2022)
- UFormer: [Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation](https://arxiv.org/pdf/2111.06015.pdf) (NWPU, 2022)
### Time Domain
- WaveUnet: [Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation](https://arxiv.org/pdf/1806.03185.pdf) (QMUL,ISMIR, 2018)
- TCNN: [TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN](https://web.cse.ohio-state.edu/~wang.77/papers/Pandey-Wang1.icassp19.pdf) (Ohio, ICASSP, 2019)
- DP-SARNN: [Dual-path Self-Attention RNN for Real-Time Speech Enhancement](https://arxiv.org/pdf/2010.12713.pdf) (Ohio, 2021)
## Acoustic Echo Cancellation
- wRLS-DFSMN: [Weighted Recursive Least Square Filter and Neural Network based Residual Echo Suppression for the AEC-Challenge](https://arxiv.org/pdf/2102.08551.pdf) (Alibaba, ICASSP, 2021)
- GCCRN [Acoustic Echo Cancellation using Deep Complex Neural Network with Nonlinear Magnitude Compression and Phase Information](https://www.isca-speech.org/archive/pdfs/interspeech_2021/peng21f_interspeech.pdf) (IACAS, Interspeech, 2021)
## Automatic Gain Control
## Speech Seperation
TODO: add important models from ESPnet and asteroid.
### Single Channel
### Multiple Channel
## Joint Optimization
- [Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions](https://www.isca-speech.org/archive_v0/Interspeech_2019/pdfs/2651.pdf) (Ohio, Interspeech, 2019)
- [Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet](https://arxiv.org/pdf/2102.05245.pdf) (Amazon, ICASSP, 2021)
- NN3A: [NN3A: Neural Network supported Acoustic Echo Cancellation, Noise Suppression and Automatic Gain Control for Real-Time Communications](https://arxiv.org/pdf/2110.08437.pdf) (Alibaba, ICASSP, 2022)
## Masking
- IBM: [On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis](https://web.cse.ohio-state.edu/~wang.77/papers/Wang05.pdf) (Ohio, 2005)
- IRM: [Ideal ratio mask estimation using deep neural networks for robust speech recognition](https://web.cse.ohio-state.edu/~wang.77/papers/Narayanan-Wang2.icassp13.pdf) (Ohio, 2013)
- PSM: [Phase-Sensitive and Recognition-Boosted Speech Separation using Deep Recurrent Neural Networks](https://www.researchgate.net/profile/Jonathan-Le-Roux/publication/308836384_Phase-Sensitive_and_Recognition-Boosted_Speech_Separation_using_Deep_Recurrent_Neural_Networks/links/58bde44545851591c5e9badb/Phase-Sensitive-and-Recognition-Boosted-Speech-Separation-using-Deep-Recurrent-Neural-Networks.pdf) (Microsoft, 2015)
- CRM: [Complex Ratio Masking for Monaural Speech Separation](http://web.cse.ohio-state.edu/~wang.77/papers/WWW.taslp16.pdf) (Ohio, 2015)