{"id":27149021,"url":"https://github.com/mahshid1378/dccrn","last_synced_at":"2025-04-08T12:35:14.833Z","repository":{"id":284358410,"uuid":"954675668","full_name":"mahshid1378/DCCRN","owner":"mahshid1378","description":"DCCRN with various loss functions","archived":false,"fork":false,"pushed_at":"2025-03-25T13:19:53.000Z","size":4128,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T14:22:46.975Z","etag":null,"topics":["deep-learning","loss-functions","perceptron","pythorch","speech-enhancement"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mahshid1378.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-25T12:55:26.000Z","updated_at":"2025-03-25T13:19:57.000Z","dependencies_parsed_at":"2025-03-25T14:32:58.484Z","dependency_job_id":null,"html_url":"https://github.com/mahshid1378/DCCRN","commit_stats":null,"previous_names":["mahshid1378/dccrn"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2FDCCRN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2FDCCRN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2FDCCRN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2FDCCRN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mahshid1378","download_url":"https://codeload.github.com/mahshid1378/DCCRN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247844069,"owners_count":21005573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","loss-functions","perceptron","pythorch","speech-enhancement"],"created_at":"2025-04-08T12:35:14.255Z","updated_at":"2025-04-08T12:35:14.827Z","avatar_url":"https://github.com/mahshid1378.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DCCRN with various loss functions\n\nDCCRN(Deep Complex Convolutional Recurrent Network) is one of the deep neaural networks proposed at [[1]](https://arxiv.org/abs/2008.00264). This repository is an application using DCCRN with various loss functions. Our original paper can be found [here](https://www.jask.or.kr/articles/xml/ABxn/), and you can check test samples [here](https://github.com/seorim0/DCCRN-with-various-loss-functions/tree/main/samples/0dB). Test samples are randomly choosed and we uploaded samples about SI-SNR and SI-SNR+LMS.    \n\u003cbr\u003e   \n   \n![DCCRN_수정최종](https://user-images.githubusercontent.com/55497506/105969652-d39f6b80-60cb-11eb-805c-0f204405ef37.png)\n\u003e Source of the figure: [paper](https://www.jask.or.kr/articles/xml/ABxn/)   \n\u003cbr\u003e\n\n\n\n# Loss functions\nWe use two base loss functions and two perceptual loss functions.\n\n\u003e Base loss\n  1. MSE: Mean Squred Error   \n  ![image](https://user-images.githubusercontent.com/55497506/106714015-97758900-663e-11eb-9593-6ecfd4266a41.png)\n  \u003cbr\u003e\n\n  2. SI-SNR: Scale Invariant Source-to-Noise Ratio   \n  ![image](https://user-images.githubusercontent.com/55497506/106714206-da376100-663e-11eb-94c6-77f6588616b9.png)\n  \u003cbr\u003e\n\n\u003e Perceptual loss\n  1. LMS: Log Mel Spectra   \n  ![image](https://user-images.githubusercontent.com/55497506/106714238-e58a8c80-663e-11eb-8601-58bb020a2d3b.png)\n  \u003cbr\u003e\n\n  2. PMSQE: Perceptual Metric for Speech Quality Evaluation   \n  ![image](https://user-images.githubusercontent.com/55497506/106714147-c855be00-663e-11eb-8a8d-a9d5aba1325d.png)\n  \u003cbr\u003e   \n\nWe combined 2 types of base loss functons and 2 types of perceptual loss functions. The coupling constant ratio was determined experimentally. For example, in the case of MSE, which is the basic loss function, the initial size is about 0.001 ~ 0.002, whereas the LMS has an initial size of 0.1 ~ 0.2 and PMSQE is about 0.8 ~ 1.3. Therefore, to combine the two terms to be of similar size, a smaller coefficient was used in the perceptual based loss function term. The coupling constant ratio is a result of reflecting the dynamic range of the two terms rather than reflecting the sensitivity of the two terms. Meanwhile, in the course of the experiment, we determined that the basic loss function is a more important term, so we changed the coefficients so that the dynamic range ratio including the coupling constant could be adjusted from 1:1 to 10:1, respectively.   \n \u003cbr\u003e\n \n# Requirements\n\u003e This repository is tested on Ubuntu 20.04.\n* Python 3.7+\n* Cuda 10.1+\n* CuDNN 7+\n* Pytorch 1.7+\n\u003cbr\u003e\n\n\u003e Library\n* tqdm\n* asteroid\n* scipy\n* matplotlib\n* tensorboardX\n* pesq\n* pystoi\n\n# Prepare data\nThe training and validation data consist of the following three dimensions.   \n```[Batch size, 2(input \u0026 target), wav length]```   \n\u003cbr\u003e   \nThe test data consists of the following dimensions.   \n```[noise type, dB classes, Batch size, 2(input \u0026 target), wav length]```   \nWe use 2 type of noise, seen and unseen and 7 dB classes from -10dB to 20dB.\n\n\u003cbr\u003e\nWe cut the wav files longer than 3 seconds into 3 seconds and zero padded for wav files shorter than 3 seconds.   \nThe sampling frequency is 16k.\n\n\u003c!--# Use pretrained models\nIf you want to test the model described in the [paper](), you can change chkpt_model path in ```config.py``` like ```'SI-SNR/'```  \n\u003cbr\u003e\nWe have uploaded 3 models trained with each loss function, SI-SNR, SI-SNR + LMS and SI-SNR + PMSQE.--\u003e   \n\n# Performance comparative evaluation\n**Objective evaluation**   \n\u003cbr\u003e   \nWe evaluate the outputs with PESQ(Perceptual Evaluation of Speech Quality) and STOI(Short Time Objective Intelligibility measure).   \n![t1](https://user-images.githubusercontent.com/55497506/108797149-e1aeb200-75cd-11eb-8ea4-3db00da21991.png)   \n\u003cbr\u003e   \n\n![t2](https://user-images.githubusercontent.com/55497506/108797168-eb381a00-75cd-11eb-94ba-1d3a1016fb6e.png)   \n\u003cbr\u003e   \n\n**Spectrogram**   \n\n![image](https://user-images.githubusercontent.com/55497506/108705017-1a0fab00-7550-11eb-962a-9f0b218371a8.png)   \n\u003e Source of the figure: [paper]()   \n\nThe spectrograms of  (a) clean speech, (b) noisy speech at 0 dB SNR, estimated speeches using (c)  MSE and PMSQE, (d)  SI-SNR , (e) SI-SNR and PMSQE, (f)  SI-SNR and LMS. \n\n# References\n**DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement**   \nYanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie   \n[[arXiv]](https://arxiv.org/abs/2008.00264)\n\n\n# Paper\n**Performance comparison evaluation of speech enhancement using various loss function.**   \nSeo-Rim Hwang, Joon Byun, Young-Cheul Park   \n[[paper]](https://www.jask.or.kr/articles/xml/ABxn/)   \n   \n   \n# Note   \n* ~~I'm trying to the codes more clearly.~~\n* ~~It's still in the editing phase. Please refer to the existing code.~~\n* [cleanup and upgrade version code]\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahshid1378%2Fdccrn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmahshid1378%2Fdccrn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahshid1378%2Fdccrn/lists"}