{"id":13419323,"url":"https://github.com/jsingh811/pyAudioProcessing","last_synced_at":"2025-03-15T05:30:56.083Z","repository":{"id":35074032,"uuid":"197088356","full_name":"jsingh811/pyAudioProcessing","owner":"jsingh811","description":"Audio feature extraction and classification","archived":false,"fork":false,"pushed_at":"2023-07-06T22:21:16.000Z","size":24063,"stargazers_count":222,"open_issues_count":9,"forks_count":39,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-25T09:14:29.809Z","etag":null,"topics":["audio-data","audio-files","chroma-features","classifier","classifier-options","classify","classify-audio","classify-audio-samples","feature-extraction","gfcc","gfcc-extractor","gfcc-features","hyperparameter-tuning","mfcc","mfcc-extractor","mfcc-features","pyaudioprocessing","spectral-features","wav-files"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jsingh811.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-16T00:03:20.000Z","updated_at":"2024-11-20T08:55:48.000Z","dependencies_parsed_at":"2024-10-26T16:04:40.478Z","dependency_job_id":"589d2734-2b3f-4ecc-928b-b3f8710988a4","html_url":"https://github.com/jsingh811/pyAudioProcessing","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsingh811%2FpyAudioProcessing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsingh811%2FpyAudioProcessing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsingh811%2FpyAudioProcessing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsingh811%2FpyAudioProcessing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jsingh811","download_url":"https://codeload.github.com/jsingh811/pyAudioProcessing/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243690112,"owners_count":20331726,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-data","audio-files","chroma-features","classifier","classifier-options","classify","classify-audio","classify-audio-samples","feature-extraction","gfcc","gfcc-extractor","gfcc-features","hyperparameter-tuning","mfcc","mfcc-extractor","mfcc-features","pyaudioprocessing","spectral-features","wav-files"],"created_at":"2024-07-30T22:01:14.357Z","updated_at":"2025-03-15T05:30:56.077Z","avatar_url":"https://github.com/jsingh811.png","language":"Python","funding_links":[],"categories":["Python","Audio Processing \u0026 I/O"],"sub_categories":[],"readme":"# pyAudioProcessing\n\n![pyaudioprocessing](https://user-images.githubusercontent.com/16875926/131924198-e34abe7e-12d8-41f9-926d-db199734dcaa.png)\n\nA Python based library for processing audio data into features (GFCC, MFCC, spectral, chroma) and building Machine Learning models.  \nThis was initially written using `Python 3.7`, and updated several times using `Python 3.8` and `Python 3.9`, and has been tested to work with Python \u003e= 3.6, \u003c3.10.  \n\n## Getting Started  \n\n1. One way to install pyAudioProcessing and it's dependencies is from PyPI using pip\n```\npip install pyAudioProcessing\n```  \nTo upgrade to the latest version of pyAudioProcessing, the following pip command can be used.  \n```\npip install -U pyAudioProcessing\n```  \n\n2. Or, you could also clone the project and get it setup  \n\n```\ngit clone git@github.com:jsingh811/pyAudioProcessing.git\ncd pyAudioProcessing\npip install -e .\n```\nYou can also get the requirements by running\n\n```\npip install -r requirements/requirements.txt\n```\n\n\n## Contents  \n[Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring)  \n[Feature and Classifier model options](https://github.com/jsingh811/pyAudioProcessing#options)  \n[Pre-trained models](https://github.com/jsingh811/pyAudioProcessing#classifying-with-pre-trained-models)  \n[Extracting numerical features from audio](https://github.com/jsingh811/pyAudioProcessing#extracting-features-from-audios)  \n[Building custom classification models](https://github.com/jsingh811/pyAudioProcessing#training-and-classifying-audio-files)  \n[Audio cleaning](https://github.com/jsingh811/pyAudioProcessing#audio-cleaning)  \n[Audio format conversion](https://github.com/jsingh811/pyAudioProcessing#audio-format-conversion)  \n[Audio visualization](https://github.com/jsingh811/pyAudioProcessing#audio-visualization)  \n\nPlease refer to the [Wiki](https://github.com/jsingh811/pyAudioProcessing/wiki) for more details.    \n\n## Citation\n\nUsing pyAudioProcessing in your research? Please cite as follows.\n\n```\nSingh, J. (2022). pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling. In Proceedings of the Python in Science Conference. Python in Science Conference. SciPy. https://doi.org/10.25080/majora-212e5952-017\n```\n\nBibtex\n```\n@InProceedings{ jyotika_singh-proc-scipy-2022,\n  author    = { {J}yotika {S}ingh },\n  title     = { py{A}udio{P}rocessing: {A}udio {P}rocessing, {F}eature {E}xtraction, and {M}achine {L}earning {M}odeling },\n  booktitle = { {P}roceedings of the 21st {P}ython in {S}cience {C}onference },\n  pages     = { 152 - 158 },\n  year      = { 2022 },\n  doi       = { 10.25080/majora-212e5952-017 }\n}\n```\n\nTo cite the software version\n\n```\nJyotika Singh. (2021, July 22). jsingh811/pyAudioProcessing: Audio processing, feature extraction and classification (Version v1.2.0). Zenodo. http://doi.org/10.5281/zenodo.5121041\n```\n[![DOI](https://zenodo.org/badge/197088356.svg)](https://zenodo.org/badge/latestdoi/197088356)\n\n\nBibtex\n```\n@software{jyotika_singh_2021_5121041,\n  author       = {Jyotika Singh},\n  title        = {{jsingh811/pyAudioProcessing: Audio processing,\n                   feature extraction and classification}},\n  month        = jul,\n  year         = 2021,\n  publisher    = {Zenodo},\n  version      = {v1.2.0},\n  doi          = {10.5281/zenodo.5121041},\n  url          = {https://doi.org/10.5281/zenodo.5121041}\n}\n```\n\n\n## Options\n\n### Feature options  \n\nYou can choose between features `gfcc`, `mfcc`, `spectral`, `chroma` or any combination of those, example `gfcc,mfcc,spectral,chroma`, to extract from your audio files for classification or just saving extracted feature for other uses.  \n\n### Classifier options   \n\nYou can choose between `svm`, `svm_rbf`, `randomforest`, `logisticregression`, `knn`, `gradientboosting` and `extratrees`.    \nHyperparameter tuning is included in the code for each using grid search.  \n\n\n## Training and Testing Data structuring  (Optional)\n\nThe library works with data structured as per this section or alternatively with taking an input dictionary object specifying location paths of the audio files.\n\nLet's say you have 2 classes that you have training data for (music and speech), and you want to use pyAudioProcessing to train a model using available feature options. Save each class as a directory and all the training audio .wav files under the respective class directories. Example:  \n\n```bash\n.\n├── training_data\n├── music\n│   ├── music_sample1.wav\n│   ├── music_sample2.wav\n│   ├── music_sample3.wav\n│   ├── music_sample4.wav\n├── speech\n│   ├── speech_sample1.wav\n│   ├── speech_sample2.wav\n│   ├── speech_sample3.wav\n│   ├── speech_sample4.wav\n```  \n\nSimilarly, for any test data (with known labels) you want to pass through the classifier, structure it similarly as  \n\n```bash\n.\n├── testing_data\n├── music\n│   ├── music_sample5.wav\n│   ├── music_sample6.wav\n├── speech\n│   ├── speech_sample5.wav\n│   ├── speech_sample6.wav\n```  \nIf you want to classify audio samples without any known labels, structure the data similarly as  \n\n```bash\n.\n├── data\n├── unknown\n│   ├── sample1.wav\n│   ├── sample2.wav\n```  \n\n## Classifying with Pre-trained Models\n\nThere are three models that have been pre-trained and provided in this project. They are as follows.\n\n`music genre`: Contains pre-trained SVM classifier to classify audio into 10 music genres - blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock. This classifier was trained using MFCC, GFCC, spectral, and chroma features.\n\n`musicVSspeech`: Contains pre-trained SVM classifier that classifying audio into two possible classes - music and speech. This classifier was trained using MFCC, spectral, and chroma features.\n\n`musicVSspeechVSbirds`: Contains pre-trained SVM classifier that classifying audio into three possible classes - music, speech and birds. This classifier was trained using GFCC, spectral, and chroma features.\n\nThere are three ways to specify the data you want to classify.  \n\n1. Classifying a single audio file specified by input `file`.\n\n```\nfrom pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre\n\n# musicVSspeech classification\nresults_music_speech = classify_ms(file=\"/Users/xyz/Documents/audio.wav\")\n\n# musicVSspeechVSbirds classification\nresults_music_speech_birds = classify_msb(file=\"/Users/xyz/Documents/audio.wav\")\n\n# music genre classification\nresults_music_genre = classify_genre(file=\"/Users/xyz/Documents/audio.wav\")\n```\n\n2. Using `file_names` specifying locations of audios as follows.\n\n```\n# {\"audios_1\" : [\u003cpath to audio\u003e, \u003cpath to audio\u003e, ...], \"audios_2\": [\u003cpath to audio\u003e, ...],}\n\n# Examples.  \n\nfile_names = {\n\t\"music\" : [\"/Users/abc/Documents/opera.wav\", \"/Users/abc/Downloads/song.wav\"],\n\t\"birds\": [ \"/Users/abc/Documents/b1.wav\", \"/Users/abc/Documents/b2.wav\", \"/Users/abc/Desktop/birdsound.wav\"]\n}\n\nfile_names = {\n\t\"audios\" : [\"/Users/abc/Documents/opera.wav\", \"/Users/abc/Downloads/song.wav\", \"/Users/abc/Documents/b1.wav\", \"/Users/abc/Documents/b2.wav\", \"/Users/abc/Desktop/birdsound.wav\"]\n}\n```  \n\nThe following commands in Python can be used to classify your data.\n\n```\nfrom pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre\n\n# musicVSspeech classification\nresults_music_speech = classify_ms(file_names=file_names)\n\n# musicVSspeechVSbirds classification\nresults_music_speech_birds = classify_msb(file_names=file_names)\n\n# music genre classification\nresults_music_genre = classify_genre(file_names=file_names)\n```\n\n3. Using data structured as specified in [structuring guidelines](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) and passing the parent folder path as `folder_path` input.  \n\nThe following commands in Python can be used to classify your data.\n\n```\nfrom pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre\n\n# musicVSspeech classification\nresults_music_speech = classify_ms(folder_path=\"../data\")\n\n# musicVSspeechVSbirds classification\nresults_music_speech_birds = classify_msb(folder_path=\"../data\")\n\n# music genre classification\nresults_music_genre = classify_genre(folder_path=\"../data\")\n```\n\n\nSample results look like  \n```\n{'../data/music': {'beatles.wav': {'probabilities': [0.8899067858599712, 0.011922234412695229, 0.0981709797273336], 'classes': ['music', 'speech', 'birds']}, ...}\n```\n\n## Training and Classifying Audio files  \n\nAudio data can be trained, tested and classified using pyAudioProcessing. Please see [feature options](https://github.com/jsingh811/pyAudioProcessing#feature-options) and [classifier model options](https://github.com/jsingh811/pyAudioProcessing#classifier-options) for more information.   \n\nSample spoken location name dataset for spoken instances of \"london\" and \"boston\" can be found [here](https://drive.google.com/drive/folders/1AayPvvgZh4Jvi6LYDR7YS_ar7l3gEtAy?usp=sharing).\n\n### Examples  \n\nCode example of using `gfcc,spectral,chroma` feature and `svm` classifier.\n\nThere are 2 ways to pass the training data in. \n\n1. Using locations of files in a dictionary format as the input `file_names`.  \n\n2. Passing in a \t`folder_path` containing sub-folders and audio. Please refer to the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) to use your own data instead.   \n\n```\nfrom pyAudioProcessing.run_classification import  classify, train\n\n# Training\ntrain(\n\tfile_names={\n\t\t\"music\": [\u003cpath to audio\u003e, \u003cpath to audio\u003e, ..],\n\t\t\"speech\": [\u003cpath to audio\u003e, \u003cpath to audio\u003e, ..]\n\t},\n\tfeature_names=[\"gfcc\", \"spectral\", \"chroma\"],\n\tclassifier=\"svm\",\n\tclassifier_name=\"svm_test_clf\"\n)\n\n```\nOr, to use a directory containing audios organized as in [structuring guidelines](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring), the following can be used\n```\ntrain(\n\tfolder_path=\"../data\", # path to dir\n\tfeature_names=[\"gfcc\", \"spectral\", \"chroma\"],\n\tclassifier=\"svm\",\n\tclassifier_name=\"svm_test_clf\"\n)\n```\n\nThe above logs files analyzed, hyperparameter tuning results for recall, precision and F1 score, along with the final confusion matrix.\n\nTo classify audio samples with the classifier you created above,\n```\n# Classify a single file \n\nresults = classify(\n\tfile = \"\u003cpath to audio\u003e\",\n\tfeature_names=[\"gfcc\", \"spectral\", \"chroma\"],\n\tclassifier=\"svm\",\n\tclassifier_name=\"svm_test_clf\"\n)\n\n# Classify multiple files with known labels and locations\nresults = classify(\n\tfile_names={\n\t\t\"music\": [\u003cpath to audio\u003e, \u003cpath to audio\u003e, ..],\n\t\t\"speech\": [\u003cpath to audio\u003e, \u003cpath to audio\u003e, ..]\n\t},\n\tfeature_names=[\"mfcc\", \"gfcc\", \"spectral\", \"chroma\"],\n\tclassifier=\"svm\",\n\tclassifier_name=\"svm_test_clf\"\n)\n\n# or you can specify a folder path as described in the training section.\n```  \nThe above logs the filename where the classification results are saved along with the details about testing files and the classifier used if you pass in logfile=True into the function call.\n\n\nIf you cloned the project via git, the following command line example of training and classification with `gfcc,spectral,chroma` features and `svm` classifier can be used as well. Sample data can be found [here](https://github.com/jsingh811/pyAudioProcessing/tree/master/data_samples). Please refer to the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) to use your own data instead.   \n\nTraining:  \n```\npython pyAudioProcessing/run_classification.py -f \"data_samples/training\" -clf \"svm\" -clfname \"svm_clf\" -t \"train\" -feats \"gfcc,spectral,chroma\"\n```  \nClassifying:   \n\n```\npython pyAudioProcessing/run_classification.py -f \"data_samples/testing\" -clf \"svm\" -clfname \"svm_clf\" -t \"classify\" -feats \"gfcc,spectral,chroma\" -logfile \"../classifier_results\"\n```  \nClassification results get saved in `../classifier_results_svm_clf.json`.  \n\n## Extracting features from audios  \n\nThis feature lets the user extract aggregated data features calculated per audio file. See [feature options](https://github.com/jsingh811/pyAudioProcessing#feature-options) for more information on choices of features available.  \n\n### Examples  \n\nCode example for performing `gfcc` and `mfcc` feature extraction can be found below. \n\n```\nfrom pyAudioProcessing.extract_features import get_features\n\n# Feature extraction of a single file\n\nfeatures = get_features(\n  file=\"\u003cpath to audio\u003e\",\n  feature_names=[\"gfcc\", \"mfcc\"]\n)\n\n# Feature extraction of a multiple files\n\nfeatures = get_features(\n  file_names={\n    \"music\": [\u003cpath to audio\u003e, \u003cpath to audio\u003e, ..],\n    \"speech\": [\u003cpath to audio\u003e, \u003cpath to audio\u003e, ..]\n  },\n  feature_names=[\"gfcc\", \"mfcc\"]\n)\n\n# or if you have a dir with  sub-folders and audios\n# features = get_features(folder_path=\"data_samples/testing\", feature_names=[\"gfcc\", \"mfcc\"])\n\n# features is a dictionary that will hold data of the following format\n\"\"\"\n{\n  music: {file1_path: {\"features\": \u003clist\u003e, \"feature_names\": \u003clist\u003e}, ...},\n  speech: {file1_path: {\"features\": \u003clist\u003e, \"feature_names\": \u003clist\u003e}, ...},\n  ...\n}\n\"\"\"\n```  \nTo save features in a json file,\n```\nfrom pyAudioProcessing import utils\nutils.write_to_json(\"audio_features.json\", features)\n```  \n\nIf you cloned the project via git, the following command line example of for `gfcc` and `mfcc` feature extractions can be used as well. The features argument should be a comma separated string, example `gfcc,mfcc`.  \nTo use your own audio files for feature extraction, pass in the directory path containing .wav files as the `-f` argument. Please refer to the format of directory `data_samples/testing` or the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring).  \n\n```\npython pyAudioProcessing/extract_features.py -f \"data_samples/testing\"  -feats \"gfcc,mfcc\"\n```  \nFeatures extracted get saved in `audio_features.json`.  \n\n## Audio format conversion\n\nYou can convert you audio in `.mp4`, `.mp3`, `.m4a` and `.aac` to `.wav`. This will allow you to use audio feature generation and classification functionalities.\n\nIn order to convert your audios, the following code sample can be used.  \n\n```\nfrom pyAudioProcessing.convert_audio import convert_files_to_wav\n\n# dir_path is the path to the directory/folder on your machine containing audio files\ndir_path = \"data/mp4_files\"\n\n# simply change audio_format to \"mp3\", \"m4a\" or \"acc\" depending on the format\n# of audio that you are trying to convert to wav\nconvert_files_to_wav(dir_path, audio_format=\"mp4\")\n\n# the converted wav files will be saved in the same dir_path location.\n\n```\n\n\n## Audio cleaning\n\nTo remove low-activity regions from your audio clip, the following sample usage can be referred to.\n\n```\nfrom pyAudioProcessing import clean\n\nclean.remove_silence(\n\t      \u003cpath to wav file\u003e,\n               output_file=\u003cpath where you want to store cleaned wav file\u003e\n)\n```\n\n## Audio visualization\n\nTo see time-domain view of the audios, and the spectrogram of the audios, please refer to the following sample usage.\n\n```\nfrom pyAudioProcessing import plot\n\n# spectrogram plot\nplot.spectrogram(\n     \u003cpath to wav file\u003e,\n    show=True, # set to False if you do not want the plot to show\n    save_to_disk=True, # set to False if you do not want the plot to save\n    output_file=\u003cpath where you want to store spectrogram as a png\u003e\n)\n\n# time-series plot\nplot.time(\n     \u003cpath to wav file\u003e,\n    show=True, # set to False if you do not want the plot to show\n    save_to_disk=True, # set to False if you do not want the plot to save\n    output_file=\u003cpath where you want to store the plot as a png\u003e\n)\n```\n\n\n## Author  \n\nJyotika Singh  \nhttps://twitter.com/jyotikasingh_/\nhttps://www.linkedin.com/in/jyotikasingh/  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjsingh811%2FpyAudioProcessing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjsingh811%2FpyAudioProcessing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjsingh811%2FpyAudioProcessing/lists"}