{"id":17093540,"url":"https://github.com/nuniz/blind_rt60","last_synced_at":"2025-11-04T12:01:40.080Z","repository":{"id":209871247,"uuid":"716287963","full_name":"nuniz/blind_rt60","owner":"nuniz","description":"Algorithm for blind estimation of reverberation time","archived":false,"fork":false,"pushed_at":"2024-06-06T17:06:05.000Z","size":237,"stargazers_count":19,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T22:44:20.955Z","etag":null,"topics":["acoustics","audio","blind-rt60","echo","python","reverb","reverberation","rir","room-acoustics","room-impulse-response","rt60","sabine","sound"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nuniz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-11-08T20:27:45.000Z","updated_at":"2025-04-10T10:04:48.000Z","dependencies_parsed_at":"2023-11-29T15:52:30.180Z","dependency_job_id":"6fc22952-a8b7-4ec0-ab90-426d018eed0e","html_url":"https://github.com/nuniz/blind_rt60","commit_stats":null,"previous_names":["nuniz/blind_rt60"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuniz%2Fblind_rt60","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuniz%2Fblind_rt60/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuniz%2Fblind_rt60/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuniz%2Fblind_rt60/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nuniz","download_url":"https://codeload.github.com/nuniz/blind_rt60/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248643048,"owners_count":21138353,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acoustics","audio","blind-rt60","echo","python","reverb","reverberation","rir","room-acoustics","room-impulse-response","rt60","sabine","sound"],"created_at":"2024-10-14T14:07:28.893Z","updated_at":"2025-11-04T12:01:40.025Z","avatar_url":"https://github.com/nuniz.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BlindRT60\nThe Blind RT60 Estimation module, based on Ratnam et al.'s paper [1], estimates the reverberation time (RT60) for input audio signals in Python.\nThe evaluation of the BlindRT60 class involves using a speech utterance from the NOIZEUS database [2], which is a repository of a noisy speech corpus. \n\n[1] Ratnam, Rama \u0026 Jones, Douglas \u0026 Wheeler, Bruce \u0026 O'Brien, William \u0026 Lansing, Charissa \u0026 Feng, Albert. (2003). Blind estimation of reverberation time. The Journal of the Acoustical Society of America. 114. 2877-92. 10.1121/1.1616578. \n\n[2] Hu, Y. and Loizou, P. (2007). “Subjective evaluation and comparison of speech enhancement algorithms,” Speech Communication, 49, 588-601.\n\n## Installation\n```\npip install blind_rt60\n```\n\n## Basic Usage\n```\nfrom blind_rt60 import BlindRT60\nfrom scipy.io import wavfile\n\n# Create an instance of the BlindRT60 estimator\nestimator = BlindRT60()\n\n# Load your audio signal (x) and its sampling frequency (fs)\n# Example: fs, x = wavfile.read(\"path/to/audio/file.wav\")\n\n# Estimate the RT60\nrt60_estimate = estimator(x, fs)\n\n# Visualize the results\nfig = estimator.visualize(x, fs)\nplt.show()\n```\n\n## Evaluation\nThe primary functionality of the BlindRT60 class was tested by simulating scenarios with different decay rates for generated decaying chirp signals and speech. The tests make use of the pyroomacoustics library to create a simulated room with a source and microphones, facilitating a comparison between the estimated RT60 using BlindRT60 and the RT60 calculated by the Schroeder method.\n\nThe provided figure illustrates the Blind RT60 Estimation, showcasing a speech trace, continuous decay rate estimation, and a histogram of estimated decay rates. \n\u003cbr/\u003e\n![RT60](supplementary_material/graphs/BlindRT60.png)\n\n\n## Parameters\nThe BlindRT60 class accepts various parameters that allow customization of the estimation process. Here are the key parameters:\n\n* fs: Sample rate of the audio signal.\n* framelen: Length of each analysis frame in seconds.\n* hop: Hop size between analysis frames in seconds.\n* percentile: Pre-specified percentile value for RT60 estimation.\n* a_init: Initial value for the decay rate parameter.\n* sigma2_init: Initial value for the signal variance parameter.\n* max_itr: Maximum number of iterations for convergence.\n* max_err: Maximum error for convergence.\n* a_range: Range of valid values for the decay rate parameter.\n* bisected_itr: Number of iterations for the bisection method.\n* sigma2_range: Range of valid values for the signal variance parameter.\n* verbose: Enable verbose output for each iteration.\n\n# Contributions\nContributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request on the GitHub repository.\n\n# Lisence\nThis project is licensed under the MIT License. See the LICENSE file for more information.\n\n# Contact\nFor any inquiries or questions, please contact zoreasaf@gmail.com.\n\n\n# Notes\n\n## Model of Sound Decay\nWe assume that the reverberant tail of a decaying sound\ny is the product of a fine structure x that is random process, and an envelope a that is deterministic. $x\\left[ n \\right]$ is independent and identically random variables drawn from the normal distribution $N\\left( {0,\\sigma } \\right)$.\n\u003cbr/\u003e\nThe model for room decay then suggests that the observations y are specified by $y\\left( n \\right) = x\\left( n \\right) \\cdot a\\left( n \\right)$.\nDue to the time-varying term $a\\left( n \\right)$, $y\\left( n \\right)$ independent but not identically distributed, and their probability density function is $N\\left( {0,\\sigma  \\cdot a\\left( n \\right)} \\right)$.\n\u003cbr/\u003e\nFor each estimation interval the likelihood function of y is,\n$$L\\left( {y;a,\\sigma } \\right) = \\frac{1}{{\\prod\\limits_{n = 0}^{N - 1} {a\\left( n \\right)} }} \\cdot {\\left( {\\frac{1}{{2\\pi {\\sigma ^2}}}} \\right)^{N/2}} \\cdot \\exp \\left( { - \\frac{{\\sum\\limits_{n = 0}^{N - 1} {{{\\left( {\\frac{{y\\left( n \\right)}}{{a\\left( n \\right)}}} \\right)}^2}} }}{{2{\\sigma ^2}}}} \\right)$$\nN+1 parameters of the model: $a\\[0,...N\\], \\sigma$.\n\u003cbr/\u003e\nDescribe $a[n]$ by damped free decay $a\\left[ n \\right] = \\exp \\left( { - \\frac{n}{\\tau }} \\right) \\buildrel \\Delta \\over = {a^n}$,\n$$L\\left( {y;a,\\sigma } \\right) = {\\left( {\\frac{1}{{2\\pi {a^{N - 1}}{\\sigma ^2}}}} \\right)^{N/2}} \\cdot \\exp \\left( { - \\frac{{\\sum\\limits_{n = 0}^{N - 1} {{a^{ - 2n}}y{{\\left( n \\right)}^2}} }}{{2{\\sigma ^2}}}} \\right)$$\n\n## Maximum Likelihood Estimator\n### Equations\nGiven the likelihood function, the parameters $a$ and $\\sigma$ can be estimated using a maximum-likelihood approach,\n$$\\frac{{\\partial \\ln L\\left( {y;a,\\sigma } \\right)}}{{\\partial a}} = {a^{ - 1}}\\left( {\\frac{1}{{{\\sigma ^2}}}\\sum\\limits_{n = 0}^{N - 1} {n \\cdot {a^{ - 2n}}y{{\\left( n \\right)}^2} - \\frac{{N\\left( {N - 1} \\right)}}{2}} } \\right)$$\n$$\\frac{{{\\partial ^2}\\ln L\\left( {y;a,\\sigma } \\right)}}{{\\partial {a^2}}} = \\frac{{N\\left( {N - 1} \\right)}}{2}{a^{ - 2}} + \\frac{1}{{{\\sigma ^2}}}\\sum\\limits_{n = 0}^{N - 1} {n\\left( {1 - 2n} \\right) \\cdot {a^{ - 2n}}y{{\\left( n \\right)}^2}} $$\n$$\\frac{{\\partial \\ln L\\left( {y;a,\\sigma } \\right)}}{{\\partial \\sigma }} =  - \\frac{N}{\\sigma } + \\frac{1}{{{\\sigma ^3}}}\\sum\\limits_{n = 0}^{N - 1} {{a^{ - 2n}}y{{\\left( n \\right)}^2}} $$\n\u003cbr/\u003e\n* The geometric ratio is notably compressive, and in actual scenarios, the values of a are expected to be proximate to 1. Conversely, $\\sigma$ exhibits a broad range. \n* Examining the gradient of $\\frac{{\\partial \\ln L\\left( {y;a,\\sigma } \\right)}}{{\\partial a}}$, initiating the process with an initial value smaller than a requires the root-solving strategy to descend the gradient fast enough.\n\n### Solution\n* Solved using numerical and iterative approach $\\frac{{\\partial \\ln L\\left( {y;a,\\sigma } \\right)}}{{\\partial a}} = 0$; $\\frac{{\\partial \\ln L\\left( {y;a,\\sigma } \\right)}}{{\\partial \\sigma }} = 0$.\n* Estimating $a*$:\n\t1. The root was bisected until the zero was bracketed.\n\t2. The Newton–Raphson method was applied to accurate the root, ${a_{n = 1}} = {a_n} - \\frac{{\\frac{{\\partial \\ln L\\left( {y;{a_n},\\sigma } \\right)}}{{\\partial a}}}}{{\\frac{{{\\partial ^2}\\ln L\\left( {y;{a_n},\\sigma } \\right)}}{{\\partial {a^2}}}}}$.\n* Estimating $\\sigma$:\n\t$${\\sigma ^2} = \\frac{1}{N}\\sum\\limits_{n = 0}^{N - 1} {{a^{ - 2n}}y{{\\left( n \\right)}^2}}$$\n\n## Strategy for Assigning the Correct Decay Rate\n\nThe model will fail during (1) estimation frames that do not fall within a region of free decay, and (2) sound with a gradual rather than rapid offset.\n\n* In the first case, the damping of sound in a room cannot occur at a rate faster than the free decay. A robust strategy would be to select a threshold value such that the left tail of the probability density function of $a*$.\n* In the second case, $p(a^*)$ is likely to be multimodal. the strategy then is to select the first dominant peak.\n* For a unimodal symmetric distribution with $\\gamma = 0.5$ the filter will track the peak value, i.e., the median. In connected speech, where peaks cannot be clearly discriminated or the distribution is multi-modal, $\\gamma$ should peaked based on the statistics of gap durations.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnuniz%2Fblind_rt60","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnuniz%2Fblind_rt60","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnuniz%2Fblind_rt60/lists"}