{"id":30762862,"url":"https://github.com/interactiveaudiolab/penn","last_synced_at":"2025-09-04T15:50:06.660Z","repository":{"id":65548047,"uuid":"328298062","full_name":"interactiveaudiolab/penn","owner":"interactiveaudiolab","description":"Pitch Estimating Neural Networks (PENN)","archived":false,"fork":false,"pushed_at":"2025-04-02T17:03:42.000Z","size":76824,"stargazers_count":257,"open_issues_count":1,"forks_count":24,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-07-11T10:14:56.545Z","etag":null,"topics":["frequency","music","periodicity","pitch","speech","voicing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/interactiveaudiolab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-10T03:50:25.000Z","updated_at":"2025-06-30T07:38:14.000Z","dependencies_parsed_at":"2023-12-22T23:27:04.847Z","dependency_job_id":"fa522c57-6a90-49fd-99b5-fd431b6b77bb","html_url":"https://github.com/interactiveaudiolab/penn","commit_stats":{"total_commits":203,"total_committers":4,"mean_commits":50.75,"dds":0.5862068965517242,"last_synced_commit":"962af14f29947b33030223d6d8637ab2b475ff24"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/interactiveaudiolab/penn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fpenn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fpenn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fpenn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fpenn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/interactiveaudiolab","download_url":"https://codeload.github.com/interactiveaudiolab/penn/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Fpenn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273633436,"owners_count":25140775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["frequency","music","periodicity","pitch","speech","voicing"],"created_at":"2025-09-04T15:49:59.658Z","updated_at":"2025-09-04T15:50:06.642Z","avatar_url":"https://github.com/interactiveaudiolab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003ePitch-Estimating Neural Networks (PENN)\u003c/h1\u003e\n\u003cdiv align=\"center\"\u003e\n\n[![PyPI](https://img.shields.io/pypi/v/penn.svg)](https://pypi.python.org/pypi/penn)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://static.pepy.tech/badge/penn)](https://pepy.tech/project/penn)\n\n\u003c/div\u003e\n\nTraining, evaluation, and inference of neural pitch and periodicity estimators in PyTorch. Includes the original code for the paper [\"Cross-domain Neural Pitch and Periodicity Estimation\"](https://arxiv.org/abs/2301.12258).\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Inference](#inference)\n    * [Application programming interface](#application-programming-interface)\n        * [`penn.from_audio`](#pennfrom_audio)\n        * [`penn.from_file`](#pennfrom_file)\n        * [`penn.from_file_to_file`](#pennfrom_file_to_file)\n        * [`penn.from_files_to_files`](#pennfrom_files_to_files)\n    * [Command-line interface](#command-line-interface)\n- [Training](#training)\n    * [Download](#download)\n    * [Preprocess](#preprocess)\n    * [Partition](#partition)\n    * [Train](#train)\n    * [Monitor](#monitor)\n- [Evaluation](#evaluation)\n    * [Evaluate](#evaluate)\n    * [Plot](#plot)\n- [Citation](#citation)\n\n\n## Installation\n\nIf you want to perform pitch estimation using a pretrained FCNF0++ model, run\n`pip install penn`\n\nIf you want to train or use your own models, run\n`pip install penn[train]`\n\n\n## Inference\n\nPerform inference using FCNF0++\n\n```\nimport penn\n\n# Load audio\naudio, sample_rate = torchaudio.load('test/assets/gershwin.wav')\n\n# Here we'll use a 10 millisecond hopsize\nhopsize = .01\n\n# Provide a sensible frequency range given your domain and model\nfmin = 30.\nfmax = 1000.\n\n# Choose a gpu index to use for inference. Set to None to use cpu.\ngpu = 0\n\n# If you are using a gpu, pick a batch size that doesn't cause memory errors\n# on your gpu\nbatch_size = 2048\n\n# Select a checkpoint to use for inference. Selecting None will\n# download and use FCNF0++ pretrained on MDB-stem-synth and PTDB\ncheckpoint = None\n\n# Centers frames at hopsize / 2, 3 * hopsize / 2, 5 * hopsize / 2, ...\ncenter = 'half-hop'\n\n# (Optional) Linearly interpolate unvoiced regions below periodicity threshold\ninterp_unvoiced_at = .065\n\n# (Optional) Select a decoding method. One of ['argmax', 'pyin', 'viterbi'].\ndecoder = 'viterbi'\n\n# Infer pitch and periodicity\npitch, periodicity = penn.from_audio(\n    audio,\n    sample_rate,\n    hopsize=hopsize,\n    fmin=fmin,\n    fmax=fmax,\n    checkpoint=checkpoint,\n    batch_size=batch_size,\n    center=center,\n    decoder=decoder,\n    interp_unvoiced_at=interp_unvoiced_at,\n    gpu=gpu)\n```\n\nNote that pitch estimation is performed independently on each frame of audio. Then, a _decoding_ step occurs, which may or may not be computed independently on each frame. Most often, Viterbi decoding is used (as in, e.g., PYIN and CREPE). However, Viterbi decoding is slow. We made a fast Viterbi decoder called [torbi](https://github.com/maxrmorrison/torbi), which [we are working on adding to PyTorch](https://github.com/pytorch/pytorch/issues/121160). Until `torbi` is integrated into PyTorch (or otherwise made pip-installable), it is recommended to use the `dev` branch of `penn`, which uses `torbi` decoding by default, but is not pip-installable. Our paper [_Fine-Grained and Interpretable Neural Speech Editing_](https://www.maxrmorrison.com/sites/promonet/) introduces and demonstrates the efficacy of `torbi` for pitch decoding. \n\n\n### Application programming interface\n\n#### `penn.from_audio`\n\n```\ndef from_audio(\n    audio: torch.Tensor,\n    sample_rate: int = penn.SAMPLE_RATE,\n    hopsize: float = penn.HOPSIZE_SECONDS,\n    fmin: float = penn.FMIN,\n    fmax: float = penn.FMAX,\n    checkpoint: Optional[Path] = None,\n    batch_size: Optional[int] = None,\n    center: str = 'half-window',\n    decoder: str = penn.DECODER,\n    interp_unvoiced_at: Optional[float] = None,\n    gpu: Optional[int] = None\n) -\u003e Tuple[torch.Tensor, torch.Tensor]:\n\"\"\"Perform pitch and periodicity estimation\n\nArgs:\n    audio: The audio to extract pitch and periodicity from\n    sample_rate: The audio sample rate\n    hopsize: The hopsize in seconds\n    fmin: The minimum allowable frequency in Hz\n    fmax: The maximum allowable frequency in Hz\n    checkpoint: The checkpoint file\n    batch_size: The number of frames per batch\n    center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n    interp_unvoiced_at: Specifies voicing threshold for interpolation\n    gpu: The index of the gpu to run inference on\n\nReturns:\n    pitch: torch.tensor(\n        shape=(1, int(samples // penn.seconds_to_sample(hopsize))))\n    periodicity: torch.tensor(\n        shape=(1, int(samples // penn.seconds_to_sample(hopsize))))\n\"\"\"\n```\n\n\n#### `penn.from_file`\n\n```\ndef from_file(\n    file: Path,\n    hopsize: float = penn.HOPSIZE_SECONDS,\n    fmin: float = penn.FMIN,\n    fmax: float = penn.FMAX,\n    checkpoint: Optional[Path] = None,\n    batch_size: Optional[int] = None,\n    center: str = 'half-window',\n    decoder: str = penn.DECODER,\n    interp_unvoiced_at: Optional[float] = None,\n    gpu: Optional[int] = None\n) -\u003e Tuple[torch.Tensor, torch.Tensor]:\n\"\"\"Perform pitch and periodicity estimation from audio on disk\n\nArgs:\n    file: The audio file\n    hopsize: The hopsize in seconds\n    fmin: The minimum allowable frequency in Hz\n    fmax: The maximum allowable frequency in Hz\n    checkpoint: The checkpoint file\n    batch_size: The number of frames per batch\n    center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n    interp_unvoiced_at: Specifies voicing threshold for interpolation\n    gpu: The index of the gpu to run inference on\n\nReturns:\n    pitch: torch.tensor(shape=(1, int(samples // hopsize)))\n    periodicity: torch.tensor(shape=(1, int(samples // hopsize)))\n\"\"\"\n```\n\n\n#### `penn.from_file_to_file`\n\n```\ndef from_file_to_file(\n    file: Path,\n    output_prefix: Optional[Path] = None,\n    hopsize: float = penn.HOPSIZE_SECONDS,\n    fmin: float = penn.FMIN,\n    fmax: float = penn.FMAX,\n    checkpoint: Optional[Path] = None,\n    batch_size: Optional[int] = None,\n    center: str = 'half-window',\n    decoder: str = penn.DECODER,\n    interp_unvoiced_at: Optional[float] = None,\n    gpu: Optional[int] = None\n) -\u003e None:\n\"\"\"Perform pitch and periodicity estimation from audio on disk and save\n\nArgs:\n    file: The audio file\n    output_prefix: The file to save pitch and periodicity without extension\n    hopsize: The hopsize in seconds\n    fmin: The minimum allowable frequency in Hz\n    fmax: The maximum allowable frequency in Hz\n    checkpoint: The checkpoint file\n    batch_size: The number of frames per batch\n    center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n    interp_unvoiced_at: Specifies voicing threshold for interpolation\n    gpu: The index of the gpu to run inference on\n\"\"\"\n```\n\n\n#### `penn.from_files_to_files`\n\n```\ndef from_files_to_files(\n    files: List[Path],\n    output_prefixes: Optional[List[Path]] = None,\n    hopsize: float = penn.HOPSIZE_SECONDS,\n    fmin: float = penn.FMIN,\n    fmax: float = penn.FMAX,\n    checkpoint: Optional[Path] = None,\n    batch_size: Optional[int] = None,\n    center: str = 'half-window',\n    decoder: str = penn.DECODER,\n    interp_unvoiced_at: Optional[float] = None,\n    num_workers: int = penn.NUM_WORKERS,\n    gpu: Optional[int] = None\n) -\u003e None:\n\"\"\"Perform pitch and periodicity estimation from files on disk and save\n\nArgs:\n    files: The audio files\n    output_prefixes: Files to save pitch and periodicity without extension\n    hopsize: The hopsize in seconds\n    fmin: The minimum allowable frequency in Hz\n    fmax: The maximum allowable frequency in Hz\n    checkpoint: The checkpoint file\n    batch_size: The number of frames per batch\n    center: Padding options. One of ['half-window', 'half-hop', 'zero'].\n    interp_unvoiced_at: Specifies voicing threshold for interpolation\n    num_workers: Number of CPU threads for async data I/O\n    gpu: The index of the gpu to run inference on\n\"\"\"\n```\n\n\n### Command-line interface\n\n```\npython -m penn\n    --files FILES [FILES ...]\n    [-h]\n    [--config CONFIG]\n    [--output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]]\n    [--hopsize HOPSIZE]\n    [--fmin FMIN]\n    [--fmax FMAX]\n    [--checkpoint CHECKPOINT]\n    [--batch_size BATCH_SIZE]\n    [--center {half-window,half-hop,zero}]\n    [--decoder {argmax,pyin,viterbi}]\n    [--interp_unvoiced_at INTERP_UNVOICED_AT]\n    [--num_workers NUM_WORKERS]\n    [--gpu GPU]\n\nrequired arguments:\n    --files FILES [FILES ...]\n        The audio files to process\n\noptional arguments:\n    -h, --help\n        show this help message and exit\n    --config CONFIG\n        The configuration file. Defaults to using FCNF0++.\n    --output_prefixes OUTPUT_PREFIXES [OUTPUT_PREFIXES ...]\n        The files to save pitch and periodicity without extension.\n        Defaults to files without extensions.\n    --hopsize HOPSIZE\n        The hopsize in seconds. Defaults to 0.01 seconds.\n    --fmin FMIN\n        The minimum frequency allowed in Hz. Defaults to 31.0 Hz.\n    --fmax FMAX\n        The maximum frequency allowed in Hz. Defaults to 1984.0 Hz.\n    --checkpoint CHECKPOINT\n        The model checkpoint file. Defaults to ./penn/assets/checkpoints/fcnf0++.pt.\n    --batch_size BATCH_SIZE\n        The number of frames per batch. Defaults to 2048.\n    --center {half-window,half-hop,zero}\n        Padding options\n    --decoder {argmax,pyin,viterbi}\n        Posteriorgram decoder\n    --interp_unvoiced_at INTERP_UNVOICED_AT\n        Specifies voicing threshold for interpolation. Defaults to 0.1625.\n    --num_workers\n        Number of CPU threads for async data I/O\n    --gpu GPU\n        The index of the gpu to perform inference on. Defaults to CPU.\n```\n\n\n## Training\n\n### Download\n\n`python -m penn.data.download`\n\nDownloads and uncompresses the `mdb` and `ptdb` datasets used for training.\n\n\n### Preprocess\n\n`python -m penn.data.preprocess --config \u003cconfig\u003e`\n\nConverts each dataset to a common format on disk ready for training. You\ncan optionally pass a configuration file to override the default configuration.\n\n\n### Partition\n\n`python -m penn.partition`\n\nGenerates `train`, `valid`, and `test` partitions for `mdb` and `ptdb`.\nPartitioning is deterministic given the same random seed. You do not need to\nrun this step, as the original partitions are saved in\n`penn/assets/partitions`.\n\n\n### Train\n\n`python -m penn.train --config \u003cconfig\u003e --gpu \u003cgpu\u003e`\n\nTrains a model according to a given configuration on the `mdb` and `ptdb`\ndatasets.\n\n\n### Monitor\n\nYou can monitor training via `tensorboard`.\n\n```\ntensorboard --logdir runs/ --port \u003cport\u003e --load_fast true\n```\n\nTo use the `torchutil` notification system to receive notifications for long\njobs (download, preprocess, train, and evaluate), set the\n`PYTORCH_NOTIFICATION_URL` environment variable to a supported webhook as\nexplained in [the Apprise documentation](https://pypi.org/project/apprise/).\n\n\n## Evaluation\n\n### Evaluate\n\n```\npython -m penn.evaluate \\\n    --config \u003cconfig\u003e \\\n    --checkpoint \u003ccheckpoint\u003e \\\n    --gpu \u003cgpu\u003e\n```\n\nEvaluate a model. `\u003ccheckpoint\u003e` is the checkpoint file to evaluate and `\u003cgpu\u003e`\nis the GPU index.\n\n\n### Plot\n\n```\npython -m penn.plot.density \\\n    --config \u003cconfig\u003e \\\n    --true_datasets \u003ctrue_datasets\u003e \\\n    --inference_datasets \u003cinference_datasets\u003e \\\n    --output_file \u003coutput_file\u003e \\\n    --checkpoint \u003ccheckpoint\u003e \\\n    --gpu \u003cgpu\u003e\n```\n\nPlot the data distribution and inferred distribution for a given dataset and\nsave to a jpg file.\n\n```\npython -m penn.plot.logits \\\n    --config \u003cconfig\u003e \\\n    --audio_file \u003caudio_file\u003e \\\n    --output_file \u003coutput_file\u003e \\\n    --checkpoint \u003ccheckpoint\u003e \\\n    --gpu \u003cgpu\u003e\n```\n\nPlot the pitch posteriorgram of an audio file and save to a jpg file.\n\n```\npython -m penn.plot.threshold \\\n    --names \u003cnames\u003e \\\n    --evaluations \u003cevaluations\u003e \\\n    --output_file \u003coutput_file\u003e\n```\n\nPlot the periodicity performance (voiced/unvoiced F1) over mdb and ptdb as a\nfunction of the voiced/unvoiced threshold. `names` are the plot labels to give\neach evaluation. `evaluations` are the names of the evaluations to plot.\n\n\n## Citation\n\n### IEEE\nM. Morrison, C. Hsieh, N. Pruyne, and B. Pardo, \"Cross-domain Neural Pitch and Periodicity Estimation,\" arXiv preprint arXiv:2301.12258, 2023.\n\n\n### BibTex\n\n```\n@inproceedings{morrison2023cross,\n    title={Cross-domain Neural Pitch and Periodicity Estimation},\n    author={Morrison, Max and Hsieh, Caedon and Pruyne, Nathan and Pardo, Bryan},\n    booktitle={arXiv preprint arXiv:2301.12258},\n    year={2023}\n}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractiveaudiolab%2Fpenn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finteractiveaudiolab%2Fpenn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractiveaudiolab%2Fpenn/lists"}