{"id":15472045,"url":"https://github.com/pganssle/python-norm-estimate","last_synced_at":"2025-03-18T11:14:16.411Z","repository":{"id":27370245,"uuid":"30845762","full_name":"pganssle/python-norm-estimate","owner":"pganssle","description":"Python scripts demonstrating subsampling methods in estimating volume normalization factors for podcasts.","archived":false,"fork":false,"pushed_at":"2015-02-16T21:50:39.000Z","size":404,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-24T17:34:43.099Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pganssle.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-15T23:05:12.000Z","updated_at":"2020-06-02T20:19:03.000Z","dependencies_parsed_at":"2022-07-24T15:01:57.026Z","dependency_job_id":null,"html_url":"https://github.com/pganssle/python-norm-estimate","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pganssle%2Fpython-norm-estimate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pganssle%2Fpython-norm-estimate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pganssle%2Fpython-norm-estimate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pganssle%2Fpython-norm-estimate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pganssle","download_url":"https://codeload.github.com/pganssle/python-norm-estimate/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244207746,"owners_count":20416109,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T02:24:47.702Z","updated_at":"2025-03-18T11:14:16.386Z","avatar_url":"https://github.com/pganssle.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Podcasts are often mastered at fairly different, and even inconsistent levels. This is fairly easy\nto fix if you know the RMS level of each episode, but it can be resource-intensive and time\nconsuming to read an entire episode (which is often an hour or more of audio) just to calculate\nthe total RMS. The resource consumption problem is compounded by the fact that often podcasts are\nloaded directly onto phones, which are generally low-power devices.\n\nHowever, in most cases, mastering problems are consistent throughout a podcast episode - it's rare\nthat the audio levels vary dramatically within a given episode. As such, you only really need to\nprocess as much of the episode as it takes to get close to the true RMS of the entire episode.\nIf you aren't concerned about intermittent clipping, this can be done using a very small subset of\nthe episode.\n\nThis repository explores how to optimally sample the episode so as to get as close as possible to\nthe true RMS in the smallest number of samples, using various different sampling techniques.\n\nThe demo uses the following podcasts, of various mastering qualities:\u003csup\u003e\u003ca href=\"#ref1\"\u003e1\u003c/a\u003e\u003c/sup\u003e\n* [Serial, Season 1 Episode 7: *The Opposite of Prosecution*](http://serialpodcast.org/season-one/7/the-opposite-of-the-prosecution) (`serial-episode-07.mp3`, 32m 34s)\n* [Seattle Library Podcast - October 28, 2014: *Polio Then and Now: From Salk’s Game-Changing Vaccine to Today’s Resurgence*](http://www.spl.org/Audio/14_10_28_Rob_Lin.mp3) (`seattle-library-podcast-2014-10-28.mp3`, 1h 17m 02s)\n* [Spy Museum Podcast - Author Debriefing: *Good Hunting, An American Spymaster's Story*](http://www.spymuseum.org/multimedia/spycast/episode/author-debriefing-good-hunting-an-american-spymasters-story/) (`spy-museum-podcast-2014-11-07.mp3`, 1h 07m 42s)\n* [Federalist Society Events Audio - *Defining Regulatory Crimes*](http://www.fed-soc.org/multimedia/detail/defining-regulatory-crimes-event-audiovideo) (`federalist-society-events-2014-06-11`, 1h 19m 06s)\n\n### Sequential subset\n\nTo get an estimate of how many samples it takes to get close to the true value, I can calculate the\ntrue loudness of the file to start with, then try reading it out until I get close enough to the\ntrue value. Retrieving samples from the start, middle and end of each file, until I get to within\n0.2 dB of the true value gives me:\n\n\n    \u003e\u003e\u003e python run-demo.py -ds -sh -uc -uc -ms 100000000\n    serial-episode-07.mp3\n    True dBFS: -19.037004\n    Start number of samples: 23846 (-19.236950 dB)\n    Middle number of samples: 1187 (-18.840281 dB)\n    End number of samples: 131014 (-19.236404 dB)\n\n    seattle-library-podcast-2014-10-28.mp3\n    True dBFS: -34.131281\n    Start number of samples: 10215 (-34.331054 dB)\n    Middle number of samples: 86860 (-33.931294 dB)\n    End number of samples: 248471 (-34.331107 dB)\n\n    spy-museum-podcast-2014-11-07.mp3\n    True dBFS: -21.969535\n    Start number of samples: 7350369 (-22.169532 dB)\n    Middle number of samples: 78544 (-22.167233 dB)\n    End number of samples: 3105373 (-22.169480 dB)\n\n    federalist-society-events-2014-06-11.mp3\n    True dBFS: -25.349533\n    Start number of samples: 26262143 (-25.549532 dB)\n    Middle number of samples: 96944106 (-25.549533 dB)\n    End number of samples: 78138 (-25.545350 dB)\n\n\nIt seems like in many cases, if you're going to pick a place and start sampling, the middle is the\nbest place to start - but the Federalist Society event podcast is a notable exception to this - with\nthe middle being the absolute *worst* place to start.  If you are willing to accept a bigger error \nfactor, the \"start\" line gets close pretty quickly, but the \"middle\" has some long quiet period that\nwill certainly take a while to average out:\n\n\u003cdiv align=\"center\" style=\"width:750px\"\u003e\n\u003ca href=\"https://raw.githubusercontent.com/pganssle/python-norm-estimate/master/outputs/sequential/federalist-society-events-2014-06-11%20%28Until%20Close%29.png\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/pganssle/python-norm-estimate/master/outputs/sequential/federalist-society-events-2014-06-11%20%28Until%20Close%29.png\" width=\"750px\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\nIn any case, the worst case scenario here (the Federalist Society podcast, starting in the middle), \nstill only takes 36 minutes of audio to get us the data we need - which isn't great, but it's still \nless than 50% of the total duration of the episode, so it's a start.\n\n### Random Subset\n\nRather than choosing samples sequentially, if we choose them at random, it's very unlikely that\nyou'll take *too many* samples from any given quiet or loud region, so you should be able to get\nclose to the true value much more quickly. Repeating the above test using random subsampling (and\nbecause it's not the same results every time, repeating *that* test 150 times), you get:\n\n\n    \u003e\u003e\u003e python run-demo.py -dr -rr 150 -uc --min-samples 100\n    serial-episode-07.mp3\n    True dBFS: -19.037004\n    Final dBFS: -18.857259\n    Mean number of samples: 534.286667 (24.230688 ms)\n    Min number of samples: 101.000000 (4.580499 ms)\n    Max number of samples: 7666.000000 (347.664399 ms)\n\n    seattle-library-podcast-2014-10-28.mp3\n    True dBFS: -34.131281\n    Final dBFS: -33.936553\n    Mean number of samples: 1517.386667 (34.407861 ms)\n    Min number of samples: 101.000000 (2.290249 ms)\n    Max number of samples: 66808.000000 (1514.920635 ms)\n\n    spy-museum-podcast-2014-11-07.mp3\n    True dBFS: -21.969535\n    Final dBFS: -21.773998\n    Mean number of samples: 1000.406667 (22.684958 ms)\n    Min number of samples: 101.000000 (2.290249 ms)\n    Max number of samples: 31924.000000 (723.900227 ms)\n\n    federalist-society-events-2014-06-11.mp3\n    True dBFS: -25.349533\n    Final dBFS: -25.152401\n    Mean number of samples: 2160.993333 (49.002116 ms)\n    Min number of samples: 101.000000 (2.290249 ms)\n    Max number of samples: 38802.000000 (879.863946 ms)\n\n\nSo it looks like the absolute worst case performance here is about 1.5 seconds - considerably less \nthan the 36 minutes from the sequential test! Of course, we can't assume that we know the true value \nof the loudness, since that's what we're trying to measure, so we have a question of when to stop \ntaking random samples. For the moment, we can sidestep that problem by just choosing a fixed \nduration worth of random samples and seeing how close that gets us to the true value. Sampling 2 \nseconds (44,000 or 88,000 samples) worth of audio from each of these files 150 times gives the \nfollowing results:\n\n\n    \u003e\u003e\u003e python run-demo.py -dr -rr 150 -d 2.0 \n    serial-episode-07.mp3\n    True dBFS: -19.037004\n    Mean dBFS: -19.040694 (Error: 0.003690)\n    Min dBFS: -19.162484 (Error: 0.125480)\n    Max dBFS: -18.903390 (Error: 0.133614)\n    Standard Deviation:  0.051996\n\n    seattle-library-podcast-2014-10-28.mp3\n    True dBFS: -34.131281\n    Mean dBFS: -34.115192 (Error: 0.016089)\n    Min dBFS: -34.311679 (Error: 0.180398)\n    Max dBFS: -33.937149 (Error: 0.194132)\n    Standard Deviation:  0.074995\n\n    spy-museum-podcast-2014-11-07.mp3\n    True dBFS: -21.969535\n    Mean dBFS: -21.961811 (Error: 0.007724)\n    Min dBFS: -22.102569 (Error: 0.133034)\n    Max dBFS: -21.791989 (Error: 0.177546)\n    Standard Deviation:  0.053758\n\n    federalist-society-events-2014-06-11.mp3\n    True dBFS: -25.349533\n    Mean dBFS: -25.344582 (Error: 0.004951)\n    Min dBFS: -25.604302 (Error: 0.254768)\n    Max dBFS: -25.089123 (Error: 0.260410)\n    Standard Deviation:  0.087513\n\n\nUsing [mp3Gain](https://en.wikipedia.org/wiki/MP3Gain) on these same files and setting the target\n\"normal\" volume to be the value at which Serial is mastered, mp3Gain recommends a volume boost of\n15.1 dB, 3 dB and 6 dB, respectively - almost exactly what this script would recommend. As you can\nsee, the most naive method of random sampling, using only 0.04-0.1% of the total length of the file\nto ascertain the overall volume level with enough accuracy to significantly improve the quality of\none's listening experience!\n\n#### Footnotes\n1. \u003ca name=\"ref1\" /\u003eI picked one from the Top 10 in iTunes, since that's probably mastered well and \n   normalized optimally, then I tried to fill out the rest with live or \"event\" type podcasts, which \n   tend to be more unevenly mastered in my experience.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpganssle%2Fpython-norm-estimate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpganssle%2Fpython-norm-estimate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpganssle%2Fpython-norm-estimate/lists"}