{"id":24839157,"url":"https://github.com/braydenoneal/neural-audio-classification","last_synced_at":"2026-02-18T15:31:39.418Z","repository":{"id":238582705,"uuid":"796885054","full_name":"braydenoneal/neural-audio-classification","owner":"braydenoneal","description":"Audio classification using a neural network","archived":false,"fork":false,"pushed_at":"2024-12-10T18:58:44.000Z","size":1591,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-13T08:47:17.943Z","etag":null,"topics":["artificial-intelligence","audio-classification","neural-network","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/braydenoneal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-06T19:59:43.000Z","updated_at":"2024-12-10T18:58:47.000Z","dependencies_parsed_at":"2024-12-10T19:39:17.273Z","dependency_job_id":"2e01359c-3129-4150-aa31-6bf28dee52db","html_url":"https://github.com/braydenoneal/neural-audio-classification","commit_stats":null,"previous_names":["braydenoneal/neural-audio-classification"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/braydenoneal/neural-audio-classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/braydenoneal%2Fneural-audio-classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/braydenoneal%2Fneural-audio-classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/braydenoneal%2Fneural-audio-classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/braydenoneal%2Fneural-audio-classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/braydenoneal","download_url":"https://codeload.github.com/braydenoneal/neural-audio-classification/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/braydenoneal%2Fneural-audio-classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29583917,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T13:56:48.962Z","status":"ssl_error","status_checked_at":"2026-02-18T13:54:34.145Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","audio-classification","neural-network","python","pytorch"],"created_at":"2025-01-31T06:36:52.407Z","updated_at":"2026-02-18T15:31:39.399Z","avatar_url":"https://github.com/braydenoneal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Neural Audio Classification\n\nThe goal of this project is to classify audio using a neural network.\n\n## Preparation\n\nFor this project, a neural network will be trained to classify which digit (0-9) is being spoken in an audio recording.\n\nBefore training a neural network, a dataset of spoken digit audio recordings must be obtained and converted into a\nformat that a neural network can use.\n\n### Dataset\n\nThis project uses the [Audio MNIST](https://github.com/soerenab/AudioMNIST) dataset, which contains 30,000 audio samples \nof spoken digits from 60 different speakers.\n\nHaving a large selection of audio recordings from many different speakers ensures that the neural network will be\naccurate and versatile.\n\n### Image Conversion\n\nBecause the neural network used for this project takes images as input, the audio dataset will need to be converted to\nimages.\n\nOne well-known method of converting audio data into an image is by using the \n[Fast Fourier Transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform), which can decompose a wave signal into\nits discrete frequencies.\n\nDiscrete frequencies measured at discrete time intervals can represent information of a sound wave. To create an image,\ntime can be mapped to the horizontal axis, frequency to the vertical axis, and the amount of a frequency as color.\n\nThis creates a [spectrogram](https://en.wikipedia.org/wiki/Spectrogram):\n\n![spectrogram.png](images/spectrogram.png)\n\nUsing three-channel RGB colors make spectrograms easier to read for humans, however, only one channel is necessary for a\nneural network, so grayscale images will be used for this project.\n\nThe images have also been output as 100x100 for a balance between speed and performance when training the neural \nnetwork.\n\n![spectrogram_grid.png](images/spectrogram_grid.png)\n\nThe [spectrogram.py](src/spectrogram.py) file converts each audio file from the dataset into a spectrogram image.\n\n## Training\n\nWith the dataset prepared and formatted, the neural network can be created that will train on the data to accurately\nclassify a digit given a spectrogram.\n\nFor this project, a convolutional neural network will be used.\n\n### Convolutional Neural Network\n\nA [convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network) trains on data by using\nfilters, allowing the model to recognize abstract features of data.\n\nFeature recognition is useful for image classification because the model can train to recognize visual features, such as\nshape and color.\n\nThe following image is an example layout of the layers of a convolutional neural network:\n\n![convolutional_neural_network.png](images/convolutional_neural_network.png)\n\nThe [neural_network.py](src/neural_network.py) file trains a convolutional neural network on the audio images.\n\n### Results\n\nAfter training the model on the spectrograms for a couple of hours, the model was able to train to 100% accuracy on \ntraining data and over 99.5% on testing data.\n\nOnce the model is finished training, it is saved as an external file.\n\n## Application\n\nWith the model saved to a file, it can be used to classify the digit of any spectrogram input image.\n\n### Recording Audio\n\nThe [record.py](src/record.py) file records microphone input for half a second and converts the audio into an image using\nthe same process as the dataset conversion.\n\n### Results\n\nThe [predict.py](src/predict.py) file uses the trained neural network file and the recorded audio image to predict which\ndigit was spoken in the audio recording.\n\nThe trained model used in this project was able to consistently predict the correct digit of an audio recording.\n\n## Conclusion\n\nThis project demonstrated the process of applying artificial intelligence to audio classification using a neural \nnetwork.\n\nWith different datasets and more computation time/speed, the ideas in this project could be expanded to include:\n\n* Broader speech recognition\n* Different types of sound (music, animals, etc.)\n* Recurrent classification (e.g., classifying a string of digits from a single audio file)\n* An end-user application for interfacing with the neural network\n* Audio input of varying length (without squeezing them into the same size)\n\nAll source code is available under the `src` directory.\n\nThe audio files, image files, and neural network model files have been excluded from this repository due to their large\nsize, however, the [instructions in the `src` directory](src/README.md) show how these files can be obtained and \ngenerated.\n\n## Sources\n\n* [https://github.com/soerenab/AudioMNIST](https://github.com/soerenab/AudioMNIST)\n* [https://stackoverflow.com/questions/57446015/scipy-spectrogram-with-logarithmic-frequency-axis](https://stackoverflow.com/questions/57446015/scipy-spectrogram-with-logarithmic-frequency-axis)\n* [https://www.geeksforgeeks.org/matplotlib-pyplot-yscale-in-python/](https://www.geeksforgeeks.org/matplotlib-pyplot-yscale-in-python/)\n* [https://wavbvkery.com/acoustic-drum-samples/](https://wavbvkery.com/acoustic-drum-samples/)\n* [https://stackoverflow.com/questions/44787437/how-to-convert-a-wav-file-to-a-spectrogram-in-python3](https://stackoverflow.com/questions/44787437/how-to-convert-a-wav-file-to-a-spectrogram-in-python3)\n* [https://en.wikipedia.org/wiki/Spectrogram](https://en.wikipedia.org/wiki/Spectrogram)\n* [https://stackoverflow.com/questions/19587118/iterating-through-directories-with-python](https://stackoverflow.com/questions/19587118/iterating-through-directories-with-python)\n* [https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.cwt.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.cwt.html)\n* [https://pywavelets.readthedocs.io/en/latest/ref/cwt.html](https://pywavelets.readthedocs.io/en/latest/ref/cwt.html)\n* [https://stackoverflow.com/questions/78040946/wavelet-artifacts-with-basic-continuous-pywavelet](https://stackoverflow.com/questions/78040946/wavelet-artifacts-with-basic-continuous-pywavelet)\n* [https://stackoverflow.com/questions/54179171/how-to-get-2d-wavelet-by-pywt-like-stft-from-a-wav-file](https://stackoverflow.com/questions/54179171/how-to-get-2d-wavelet-by-pywt-like-stft-from-a-wav-file)\n* [https://adityadutt.medium.com/audio-classification-using-wavelet-transform-and-deep-learning-f9f0978fa246](https://adityadutt.medium.com/audio-classification-using-wavelet-transform-and-deep-learning-f9f0978fa246)\n* [https://math.stackexchange.com/questions/279980/difference-between-fourier-transform-and-wavelets](https://math.stackexchange.com/questions/279980/difference-between-fourier-transform-and-wavelets)\n* [https://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/](https://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/)\n* [https://dsp.stackexchange.com/questions/72027/python-audio-analysis-which-spectrogram-should-i-use-and-why](https://dsp.stackexchange.com/questions/72027/python-audio-analysis-which-spectrogram-should-i-use-and-why)\n* [https://ccrma.stanford.edu/%7Eunjung/mylec/WTpart1.html](https://ccrma.stanford.edu/%7Eunjung/mylec/WTpart1.html)\n* [https://arxiv.org/pdf/1706.07156](https://arxiv.org/pdf/1706.07156)\n* [https://github.com/OverLordGoldDragon/ssqueezepy](https://github.com/OverLordGoldDragon/ssqueezepy)\n* [https://dsp.stackexchange.com/questions/71398/synchrosqueezing-wavelet-transform-explanation/71399#71399](https://dsp.stackexchange.com/questions/71398/synchrosqueezing-wavelet-transform-explanation/71399#71399)\n* [https://www.mathworks.com/help/wavelet/ug/time-frequency-analysis-and-continuous-wavelet-transform.html](https://www.mathworks.com/help/wavelet/ug/time-frequency-analysis-and-continuous-wavelet-transform.html)\n* [https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53](https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53)\n* [https://towardsdatascience.com/musical-genre-classification-with-convolutional-neural-networks-ff04f9601a74](https://towardsdatascience.com/musical-genre-classification-with-convolutional-neural-networks-ff04f9601a74)\n* [https://pyimagesearch.com/2021/07/19/pytorch-training-your-first-convolutional-neural-network-cnn/](https://pyimagesearch.com/2021/07/19/pytorch-training-your-first-convolutional-neural-network-cnn/)\n* [https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)\n* [https://stackoverflow.com/questions/66952664/how-do-i-predict-using-a-pytorch-model](https://stackoverflow.com/questions/66952664/how-do-i-predict-using-a-pytorch-model)\n* [https://stackoverflow.com/questions/72366098/how-do-i-predict-using-a-trained-image-to-emotion-pytorch-model](https://stackoverflow.com/questions/72366098/how-do-i-predict-using-a-trained-image-to-emotion-pytorch-model)\n* [https://stackoverflow.com/questions/68239580/making-predictions-on-new-images-using-a-cnn-in-pytorch](https://stackoverflow.com/questions/68239580/making-predictions-on-new-images-using-a-cnn-in-pytorch)\n* [https://developers.redhat.com/learning/learn:openshift-ai:how-create-pytorch-model/resource/resources:build-train-and-run-your-pytorch-model](https://developers.redhat.com/learning/learn:openshift-ai:how-create-pytorch-model/resource/resources:build-train-and-run-your-pytorch-model)\n* [https://stackoverflow.com/questions/73396203/how-to-use-trained-pytorch-model-for-prediction](https://stackoverflow.com/questions/73396203/how-to-use-trained-pytorch-model-for-prediction)\n* [https://discuss.pytorch.org/t/how-to-run-trained-model/21785/12](https://discuss.pytorch.org/t/how-to-run-trained-model/21785/12)\n* [https://pytorch.org/docs/stable/notes/serialization.html](https://pytorch.org/docs/stable/notes/serialization.html)\n* [https://discuss.pytorch.org/t/should-we-use-our-normalization-for-training-a-pretrained-model/34905](https://discuss.pytorch.org/t/should-we-use-our-normalization-for-training-a-pretrained-model/34905)\n* [https://discuss.pytorch.org/t/normalization-is-important/114156](https://discuss.pytorch.org/t/normalization-is-important/114156)\n* [https://pytorch.org/docs/stable/generated/torch.argmax.html](https://pytorch.org/docs/stable/generated/torch.argmax.html)\n* [https://discuss.pytorch.org/t/why-is-my-model-predicting-the-same-value-over-and-over-and-why-is-my-loss-negative/161819/14](https://discuss.pytorch.org/t/why-is-my-model-predicting-the-same-value-over-and-over-and-why-is-my-loss-negative/161819/14)\n* [https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)\n* [https://github.com/soerenab/AudioMNIST/tree/master](https://github.com/soerenab/AudioMNIST/tree/master)\n* [https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)\n* [https://medium.com/bitgrit-data-science-publication/building-an-image-classification-model-with-pytorch-from-scratch-f10452073212](https://medium.com/bitgrit-data-science-publication/building-an-image-classification-model-with-pytorch-from-scratch-f10452073212)\n* [https://pytorch.org/vision/stable/models.html](https://pytorch.org/vision/stable/models.html)\n* [https://github.com/braydenoneal/deep-learning-fork/tree/master/projects/convolutional-net#convolutionalnets](https://github.com/braydenoneal/deep-learning-fork/tree/master/projects/convolutional-net#convolutionalnets)\n* [https://datascience.stackexchange.com/questions/28339/convolutional-neural-networks-layer-sizes](https://datascience.stackexchange.com/questions/28339/convolutional-neural-networks-layer-sizes)\n* [https://www.kaggle.com/code/talmanr/cnn-with-pytorch-using-mel-features](https://www.kaggle.com/code/talmanr/cnn-with-pytorch-using-mel-features)\n* [https://medium.com/thecyphy/train-cnn-model-with-pytorch-21dafb918f48](https://medium.com/thecyphy/train-cnn-model-with-pytorch-21dafb918f48)\n* [https://stackoverflow.com/questions/62787260/keras-cnn-always-predicts-same-class](https://stackoverflow.com/questions/62787260/keras-cnn-always-predicts-same-class)\n* [https://karpathy.github.io/2019/04/25/recipe/](https://karpathy.github.io/2019/04/25/recipe/)\n* [https://stackoverflow.com/questions/73033021/pytorch-fine-tuned-cnn-model-giving-always-the-same-prediction-in-training-and-v](https://stackoverflow.com/questions/73033021/pytorch-fine-tuned-cnn-model-giving-always-the-same-prediction-in-training-and-v)\n* [https://stackoverflow.com/questions/62986273/training-works-but-prediction-produces-constant-values-cnn-with-pytorch](https://stackoverflow.com/questions/62986273/training-works-but-prediction-produces-constant-values-cnn-with-pytorch)\n* [https://stackoverflow.com/questions/66783997/cnn-classifier-only-guesses-one-thing-pytorch](https://stackoverflow.com/questions/66783997/cnn-classifier-only-guesses-one-thing-pytorch)\n* [https://pytorch.org/tutorials/beginner/saving_loading_models.html](https://pytorch.org/tutorials/beginner/saving_loading_models.html)\n* [https://stackoverflow.com/questions/42703500/how-do-i-save-a-trained-model-in-pytorch](https://stackoverflow.com/questions/42703500/how-do-i-save-a-trained-model-in-pytorch)\n* [https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_models_for_inference.html](https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_models_for_inference.html)\n* [https://wandb.ai/wandb/common-ml-errors/reports/How-to-Save-and-Load-Models-in-PyTorch--VmlldzozMjg0MTE](https://wandb.ai/wandb/common-ml-errors/reports/How-to-Save-and-Load-Models-in-PyTorch--VmlldzozMjg0MTE)\n* [https://pytorch.org/docs/stable/optim.html](https://pytorch.org/docs/stable/optim.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbraydenoneal%2Fneural-audio-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbraydenoneal%2Fneural-audio-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbraydenoneal%2Fneural-audio-classification/lists"}