{"id":22677241,"url":"https://github.com/daemon/pytorch-pcen","last_synced_at":"2025-10-26T08:31:42.047Z","repository":{"id":37754181,"uuid":"153182027","full_name":"daemon/pytorch-pcen","owner":"daemon","description":"PyTorch reimplementation of per-channel energy normalization for audio.","archived":false,"fork":false,"pushed_at":"2019-03-29T23:37:21.000Z","size":11,"stargazers_count":83,"open_issues_count":1,"forks_count":15,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-08-07T04:07:08.328Z","etag":null,"topics":["audio","pytorch","speech"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daemon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-15T21:09:36.000Z","updated_at":"2023-06-01T12:56:47.000Z","dependencies_parsed_at":"2022-09-16T10:51:48.736Z","dependency_job_id":null,"html_url":"https://github.com/daemon/pytorch-pcen","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daemon%2Fpytorch-pcen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daemon%2Fpytorch-pcen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daemon%2Fpytorch-pcen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daemon%2Fpytorch-pcen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daemon","download_url":"https://codeload.github.com/daemon/pytorch-pcen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228939544,"owners_count":17994933,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","pytorch","speech"],"created_at":"2024-12-09T17:59:39.734Z","updated_at":"2025-10-26T08:31:41.974Z","avatar_url":"https://github.com/daemon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyTorch-PCEN\nEfficient PyTorch reimplementation of [per-channel energy normalization](https://arxiv.org/pdf/1607.05666.pdf) with Mel \nspectrogram features.\n\n## Overview\n\nRobustness to loudness differences in near- and far-field conditions is critical in high-quality speech recognition applications. \nObviously, spectrogram energies differ significantly between, say, shouting at arms-length and whispering from a distance. \nThis can worsen model quality, since the model itself would need to be robust across a wide range of input. The \nlog-compression step in the popular log-Mel transform partially addresses this issue by reducing the dynamic range of audio; \nhowever, it ignores per-channel energy differences and is static by definition.\n\n[Per-channel energy normalization](https://arxiv.org/pdf/1607.05666.pdf) is one such solution to the aforementioned problems. \nIt provides a per-channel, trainable front-end in place of the log compression, greatly improving model robustness in keyword spotting systems -- all the while being resource-efficient and easy to implement.\n\n## Installation and Usage\n1. PyTorch and NumPy are required. LibROSA and matplotlib are required only for the example.\n2. To install via pip, run `pip install git+https://github.com/daemon/pytorch-pcen`. Otherwise, clone this repository and run `python setup.py install`.\n3. To run the example in the module, place a 16kHz WAV file named `yes.wav` in the current directory. Then, do `python -m pcen.pcen`.\n\nThe following is a self-contained example for using a streaming PCEN layer:\n```python\nimport pcen\nimport torch\n\n# 40-dimensional features, 30-millisecond window, 10-millisecond shift; trainable is false by default\ntransform = pcen.StreamingPCENTransform(n_mels=40, n_fft=480, hop_length=160, trainable=True)\naudio = torch.empty(1, 16000).normal_(0, 0.1) # Gaussian noise\n\n# 1600 is an arbitrary chunk size; This step is unnecessary but demonstrates the streaming nature\nstreaming_chunks = audio.split(1600, 1)\npcen_chunks = [transform(chunk) for chunk in streaming_chunks] # Transform each chunk\ntransform.reset() # Reset the persistent streaming state\npcen_ = torch.cat(pcen_chunks, 1)\n```\n\n## Citation\nWang, Yuxuan, Pascal Getreuer, Thad Hughes, Richard F. Lyon, and Rif A. Saurous. [Trainable frontend for robust and far-field keyword spotting](https://arxiv.org/pdf/1607.05666.pdf). In _Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on_, pp. 5670-5674. IEEE, 2017.\n```tex\n@inproceedings{wang2017trainable,\n  title={Trainable frontend for robust and far-field keyword spotting},\n  author={Wang, Yuxuan and Getreuer, Pascal and Hughes, Thad and Lyon, Richard F and Saurous, Rif A},\n  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on},\n  pages={5670--5674},\n  year={2017},\n  organization={IEEE}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaemon%2Fpytorch-pcen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaemon%2Fpytorch-pcen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaemon%2Fpytorch-pcen/lists"}