{"id":18356039,"url":"https://github.com/audiolabs/torch-pesq","last_synced_at":"2025-10-07T18:36:34.209Z","repository":{"id":63294080,"uuid":"488510213","full_name":"audiolabs/torch-pesq","owner":"audiolabs","description":"PyTorch implementation of the Perceptual Evaluation of Speech Quality for wideband audio","archived":false,"fork":false,"pushed_at":"2023-07-14T08:04:43.000Z","size":5886,"stargazers_count":179,"open_issues_count":3,"forks_count":15,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-30T07:07:52.337Z","etag":null,"topics":["perceptual-losses","pesq","python3","pytorch","speech-enhancement"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/audiolabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-04T08:32:22.000Z","updated_at":"2025-03-27T07:23:49.000Z","dependencies_parsed_at":"2025-01-07T22:18:28.306Z","dependency_job_id":null,"html_url":"https://github.com/audiolabs/torch-pesq","commit_stats":{"total_commits":50,"total_committers":3,"mean_commits":"16.666666666666668","dds":"0.18000000000000005","last_synced_commit":"db5228024181be3df153c878380ee66e4d0bdfca"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audiolabs%2Ftorch-pesq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audiolabs%2Ftorch-pesq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audiolabs%2Ftorch-pesq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audiolabs%2Ftorch-pesq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/audiolabs","download_url":"https://codeload.github.com/audiolabs/torch-pesq/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247451652,"owners_count":20940939,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["perceptual-losses","pesq","python3","pytorch","speech-enhancement"],"created_at":"2024-11-05T22:08:47.899Z","updated_at":"2025-10-07T18:36:34.123Z","avatar_url":"https://github.com/audiolabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Loss function inspired by the PESQ score\n\n![Testing badge](https://github.com/audiolabs/torch-pesq/actions/workflows/test.yaml/badge.svg) \n![Linting badge](https://github.com/audiolabs/torch-pesq/actions/workflows/black.yaml/badge.svg) \n![Docs badge](https://github.com/audiolabs/torch-pesq/actions/workflows/docs.yaml/badge.svg)\n\nImplementation of the widely used Perceptual Evaluation of Speech Quality (PESQ) score as a torch loss function. The PESQ loss alone performs not good for noise suppression, instead combine with scale invariant [SDR](https://arxiv.org/abs/1811.02508). For more information see [1],[2].\n\n## Installation\n\nTo install the package just run:\n```bash\n$ pip install torch-pesq\n```\n\n## Usage\n\n```python\nimport torch\nfrom torch_pesq import PesqLoss\n\npesq = PesqLoss(0.5,\n    sample_rate=44100, \n)\n\nmos = pesq.mos(reference, degraded)\nloss = pesq(reference, degraded)\n\nprint(mos, loss)\nloss.backward()\n```\n\n## Comparison to reference implementation\n\nThe following figures uses samples from the VCTK [1] speech and DEMAND [2] noise dataset with varying mixing factors. They illustrate correlation and maximum error between the reference and torch implementation:\n\n![Correlation](https://raw.githubusercontent.com/audiolabs/torch-pesq/main/figures/compare_reference.png)\n\nThe difference is a result from missing time alignment implementation and a level alignment done with IIR filtering instead of a frequency weighting. They are minor and should not be significant when used as a loss function. There are two outliers which may degrade results and further investigation is needed to find the source of difference.\n\n## Validation improvements when used as loss function\n\nValidation results for fullband noise suppression:\n - Noise estimator: Recurrent [SRU](https://github.com/asappresearch/sru) with soft masking. 8 layers, width of 512 result in ~1586k parameters of the unpruned model.\n - STFT for signal coding: 512 window length, 50% overlap, hamming window\n - Mel filterbank with 32 Mel features\n\nThe baseline system uses L1 time domain loss. Combining the PESQ loss function together with scale invariant [SDR](https://arxiv.org/abs/1811.02508) gives improvement of ~0.1MOS for PESQ and slight improvements in speech distortions, as well as a more stable training progression. Horizontal lines indicate the score of noisy speech.\n\n![Validation comparison](https://raw.githubusercontent.com/audiolabs/torch-pesq/main/figures/validation.svg)\n\n## Relevant references\n1. [End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization](https://arxiv.org/abs/1901.09146)\n2. [A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality](https://ieeexplore.ieee.org/document/8468124)\n3. [P.862 : Perceptual evaluation of speech quality (PESQ)](https://www.itu.int/rec/T-REC-P.862)\n4. [Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs](https://ieeexplore.ieee.org/document/941023)\n5. [CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit](https://datashare.ed.ac.uk/handle/10283/2950)\n6. [The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings](https://asa.scitation.org/doi/abs/10.1121/1.4799597)\n\n[1]: https://arxiv.org/abs/1901.09146\n[2]: https://ieeexplore.ieee.org/document/8468124\n[3]: https://www.itu.int/rec/T-REC-P.862\n[4]: https://ieeexplore.ieee.org/document/941023\n[5]: https://datashare.ed.ac.uk/handle/10283/2950\n[6]: https://asa.scitation.org/doi/abs/10.1121/1.4799597\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faudiolabs%2Ftorch-pesq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faudiolabs%2Ftorch-pesq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faudiolabs%2Ftorch-pesq/lists"}