{"id":13532699,"url":"https://github.com/jameslyons/python_speech_features","last_synced_at":"2025-04-01T21:30:51.648Z","repository":{"id":11523267,"uuid":"14005795","full_name":"jameslyons/python_speech_features","owner":"jameslyons","description":"This library provides common speech features for ASR including MFCCs and filterbank energies.","archived":false,"fork":false,"pushed_at":"2021-10-20T10:08:48.000Z","size":221,"stargazers_count":2392,"open_issues_count":25,"forks_count":615,"subscribers_count":86,"default_branch":"master","last_synced_at":"2025-03-22T03:34:52.412Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jameslyons.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-10-31T02:42:08.000Z","updated_at":"2025-03-19T17:48:01.000Z","dependencies_parsed_at":"2022-08-07T06:16:24.775Z","dependency_job_id":null,"html_url":"https://github.com/jameslyons/python_speech_features","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jameslyons%2Fpython_speech_features","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jameslyons%2Fpython_speech_features/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jameslyons%2Fpython_speech_features/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jameslyons%2Fpython_speech_features/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jameslyons","download_url":"https://codeload.github.com/jameslyons/python_speech_features/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246712948,"owners_count":20821822,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T07:01:13.017Z","updated_at":"2025-04-01T21:30:51.263Z","avatar_url":"https://github.com/jameslyons.png","language":"Python","readme":"======================\npython_speech_features\n======================\n\nThis library provides common speech features for ASR including MFCCs and filterbank energies.\nIf you are not sure what MFCCs are, and would like to know more have a look at this \n`MFCC tutorial \u003chttp://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/\u003e`_\n\n`Project Documentation \u003chttp://python-speech-features.readthedocs.org/en/latest/\u003e`_\n\nTo cite, please use: James Lyons et al. (2020, January 14). jameslyons/python_speech_features: release v0.6.1 (Version 0.6.1). Zenodo. http://doi.org/10.5281/zenodo.3607820\n\nInstallation\n============\n\nThis `project is on pypi \u003chttps://pypi.python.org/pypi/python_speech_features\u003e`_\n\nTo install from pypi:: \n\n\tpip install python_speech_features\n\n\t\nFrom this repository::\n\n\tgit clone https://github.com/jameslyons/python_speech_features\n\tpython setup.py develop\n\n\nUsage\n=====\n\nSupported features:\n\n- Mel Frequency Cepstral Coefficients\n- Filterbank Energies\n- Log Filterbank Energies\n- Spectral Subband Centroids\n\n`Example use \u003cexample.py\u003e`_\n\nFrom here you can write the features to a file etc.\n\n\nMFCC Features\n=============\n\nThe default parameters should work fairly well for most cases, \nif you want to change the MFCC parameters, the following parameters are supported::\n\n\tpython\n\tdef mfcc(signal,samplerate=16000,winlen=0.025,winstep=0.01,numcep=13,\n\t\t\t nfilt=26,nfft=512,lowfreq=0,highfreq=None,preemph=0.97,\n             ceplifter=22,appendEnergy=True)\n\n=============\t===========\nParameter \t\tDescription\n=============\t===========\nsignal\t\t\tthe audio signal from which to compute features. Should be an N*1 array\nsamplerate \t\tthe samplerate of the signal we are working with.\nwinlen \t\t\tthe length of the analysis window in seconds. Default is 0.025s (25 milliseconds)\nwinstep \t\tthe step between successive windows in seconds. Default is 0.01s (10 milliseconds)\nnumcep\t\t\tthe number of cepstrum to return, default 13\nnfilt\t\t\tthe number of filters in the filterbank, default 26.\nnfft\t\t\tthe FFT size. Default is 512\nlowfreq\t\t\tlowest band edge of mel filters. In Hz, default is 0\nhighfreq\t\thighest band edge of mel filters. In Hz, default is samplerate/2\npreemph\t\t\tapply preemphasis filter with preemph as coefficient. 0 is no filter. Default is 0.97\nceplifter\t\tapply a lifter to final cepstral coefficients. 0 is no lifter. Default is 22\nappendEnergy\tif this is true, the zeroth cepstral coefficient is replaced with the log of the total frame energy.\nreturns\t\t\tA numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.\n=============\t===========\n\n\nFilterbank Features\n===================\n\nThese filters are raw filterbank energies. \nFor most applications you will want the logarithm of these features.\nThe default parameters should work fairly well for most cases. \nIf you want to change the fbank parameters, the following parameters are supported::\n\n\tpython\n\tdef fbank(signal,samplerate=16000,winlen=0.025,winstep=0.01,\n              nfilt=26,nfft=512,lowfreq=0,highfreq=None,preemph=0.97)\n\n=============\t===========\nParameter \t\tDescription\n=============\t===========\nsignal\t\t\tthe audio signal from which to compute features. Should be an N*1 array\nsamplerate\t\tthe samplerate of the signal we are working with\nwinlen\t\t\tthe length of the analysis window in seconds. Default is 0.025s (25 milliseconds)\nwinstep\t\t\tthe step between successive windows in seconds. Default is 0.01s (10 milliseconds)\nnfilt\t\t\tthe number of filters in the filterbank, default 26.\nnfft\t\t\tthe FFT size. Default is 512.\nlowfreq\t\t\tlowest band edge of mel filters. In Hz, default is 0\nhighfreq\t\thighest band edge of mel filters. In Hz, default is samplerate/2\npreemph\t\t\tapply preemphasis filter with preemph as coefficient. 0 is no filter. Default is 0.97\nreturns\t\t\tA numpy array of size (NUMFRAMES by nfilt) containing features. Each row holds 1 feature vector. The second return value is the energy in each frame (total energy, unwindowed)\n=============\t===========\n\n\nReference\n=========\nsample english.wav obtained from::\n\n\twget http://voyager.jpl.nasa.gov/spacecraft/audio/english.au\n\tsox english.au -e signed-integer english.wav\n","funding_links":[],"categories":["Software","Tools","音频处理","Python","Audio Related Packages","Feature Extraction"],"sub_categories":["Audio feature extraction","Coming soon...","BSS/ICA method","SSL+","Audio"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjameslyons%2Fpython_speech_features","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjameslyons%2Fpython_speech_features","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjameslyons%2Fpython_speech_features/lists"}