{"id":13625385,"url":"https://github.com/balavenkatesh3322/audio-pretrained-model","last_synced_at":"2025-04-10T20:10:44.719Z","repository":{"id":52956622,"uuid":"280638951","full_name":"balavenkatesh3322/audio-pretrained-model","owner":"balavenkatesh3322","description":"A collection of Audio and Speech pre-trained models.","archived":false,"fork":false,"pushed_at":"2020-07-21T01:47:52.000Z","size":137,"stargazers_count":187,"open_issues_count":0,"forks_count":26,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-24T17:52:46.325Z","etag":null,"topics":["audio","audio-processing","caffe","keras","keras-models","keras-tensorflow","machine-learning","mxnet","neural-network","pre-trained","pre-trained-model","pre-training","python3","pytorch","pytorch-models","speech-recognition","speech-to-text","tensorflow","tensorflow-models"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/balavenkatesh3322.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-18T11:06:43.000Z","updated_at":"2025-03-24T13:43:58.000Z","dependencies_parsed_at":"2022-09-07T21:21:28.514Z","dependency_job_id":null,"html_url":"https://github.com/balavenkatesh3322/audio-pretrained-model","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/balavenkatesh3322%2Faudio-pretrained-model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/balavenkatesh3322%2Faudio-pretrained-model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/balavenkatesh3322%2Faudio-pretrained-model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/balavenkatesh3322%2Faudio-pretrained-model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/balavenkatesh3322","download_url":"https://codeload.github.com/balavenkatesh3322/audio-pretrained-model/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248288357,"owners_count":21078903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","audio-processing","caffe","keras","keras-models","keras-tensorflow","machine-learning","mxnet","neural-network","pre-trained","pre-trained-model","pre-training","python3","pytorch","pytorch-models","speech-recognition","speech-to-text","tensorflow","tensorflow-models"],"created_at":"2024-08-01T21:01:54.906Z","updated_at":"2025-04-10T20:10:44.680Z","avatar_url":"https://github.com/balavenkatesh3322.png","language":null,"funding_links":[],"categories":["Others","Other Pre-trained Models"],"sub_categories":[],"readme":"![Maintenance](https://img.shields.io/badge/Maintained%3F-YES-green.svg)\n![GitHub](https://img.shields.io/badge/Release-PROD-yellow.svg)\n![GitHub](https://img.shields.io/badge/Languages-MULTI-blue.svg)\n![GitHub](https://img.shields.io/badge/License-MIT-lightgrey.svg)\n\n# Audio and Speech Pre-trained Models\n\n![NLP logo](https://github.com/balavenkatesh3322/audio-pretrained-model/blob/master/logo.jpg)\n\n## What is pre-trained Model?\nA pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point. A pre-trained model may not be 100% accurate in your application.\n\n## Other Pre-trained Models\n* [NLP Pre-trained Models](https://github.com/balavenkatesh3322/NLP-pretrained-model).\n* [Computer Vision Pre-trained Models](https://github.com/balavenkatesh3322/CV-pretrained-model)\n\n### Framework\n\n* [Tensorflow](#tensorflow)\n* [Keras](#keras)\n* [PyTorch](#pytorch)\n* [MXNet](#mxnet)\n* [Caffe](#caffe)\n\n\n### Model visualization\nYou can see visualizations of each model's network architecture by using [Netron](https://github.com/lutzroeder/Netron).\n\n![NLP logo](https://github.com/balavenkatesh3322/NLP-pretrained-model/blob/master/netron.png)\n\n### Tensorflow \u003ca name=\"tensorflow\"/\u003e\n\n| Model Name | Description | Framework |\n|   :---:      |     :---:      |     :---:     |\n| [Wavenet]( https://github.com/ibab/tensorflow-wavenet)  | This is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation.     | `Tensorflow`\n| [Lip Reading]( https://github.com/astorfi/lip-reading-deeplearning)  | Cross Audio-Visual Recognition using 3D Architectures in TensorFlow     | `Tensorflow`\n| [MusicGenreClassification]( https://github.com/mlachmish/MusicGenreClassification)  | Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University.     | `Tensorflow`\n| [Audioset](https://github.com/tensorflow/models/tree/master/research/audioset)  | Models and supporting code for use with AudioSet.     | `Tensorflow`\n| [DeepSpeech]( https://github.com/tensorflow/models/tree/master/research/deep_speech)  | Automatic speech recognition.     | `Tensorflow`\n\n\n\u003cdiv align=\"right\"\u003e\n    \u003cb\u003e\u003ca href=\"#framework\"\u003e↥ Back To Top\u003c/a\u003e\u003c/b\u003e\n\u003c/div\u003e\n\n***\n\n### Keras \u003ca name=\"keras\"/\u003e\n\n| Model Name | Description | Framework |\n|   :---:      |     :---:      |     :---:     |\n| [Ultrasound nerve segmentation]( https://github.com/jocicmarko/ultrasound-nerve-segmentation)  | This tutorial shows how to use Keras library to build deep neural network for ultrasound image nerve segmentation.     | `Keras`\n\n\u003cdiv align=\"right\"\u003e\n    \u003cb\u003e\u003ca href=\"#framework\"\u003e↥ Back To Top\u003c/a\u003e\u003c/b\u003e\n\u003c/div\u003e\n\n***\n\n### PyTorch \u003ca name=\"pytorch\"/\u003e\n\n| Model Name | Description | Framework |\n|   :---:      |     :---:      |     :---:     |\n| [espnet]( https://github.com/espnet/espnet)  | End-to-End Speech Processing Toolkit espnet.github.io/espnet     | `PyTorch`\n| [TTS]( https://github.com/mozilla/TTS)  | Deep learning for Text2Speech     | `PyTorch`\n| [Neural Sequence labeling model]( https://github.com/jiesutd/NCRFpp)  | Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation.     | `PyTorch`\n| [waveglow]( https://github.com/NVIDIA/waveglow)  | A Flow-based Generative Network for Speech Synthesis.     | `PyTorch`\n| [deepvoice3_pytorch]( https://github.com/r9y9/deepvoice3_pytorch)  | PyTorch implementation of convolutional networks-based text-to-speech synthesis models.     | `PyTorch`\n| [deepspeech2]( https://github.com/SeanNaren/deepspeech.pytorch)  | Implementation of DeepSpeech2 using Baidu Warp-CTC. Creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function.     | `PyTorch`\n| [loop]( https://github.com/facebookarchive/loop)  | A method to generate speech across multiple speakers.    | `PyTorch`\n| [audio]( https://github.com/pytorch/audio)  | Simple audio I/O for pytorch.     | `PyTorch`\n| [speech]( https://github.com/awni/speech)  | PyTorch ASR Implementation.     | `PyTorch`\n| [samplernn-pytorch]( https://github.com/deepsound-project/samplernn-pytorch)  | PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.     | `PyTorch`\n| [torch_waveglow]( https://github.com/npuichigo/waveglow)  | A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis.     | `PyTorch`\n\n\n\u003cdiv align=\"right\"\u003e\n    \u003cb\u003e\u003ca href=\"#framework\"\u003e↥ Back To Top\u003c/a\u003e\u003c/b\u003e\n\u003c/div\u003e\n\n***\n\n\n### MXNet \u003ca name=\"mxnet\"/\u003e\n\n| Model Name | Description | Framework |\n|   :---:      |     :---:      |     :---:     |\n| [deepspeech]( https://github.com/samsungsds-rnd/deepspeech.mxnet)  | This example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale using     | `MXNet`\n| [mxnet-audio]( https://github.com/chen0040/mxnet-audio)  | Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet.     | `MXNet`\n\n\n\u003cdiv align=\"right\"\u003e\n    \u003cb\u003e\u003ca href=\"#framework\"\u003e↥ Back To Top\u003c/a\u003e\u003c/b\u003e\n\u003c/div\u003e\n\n***\n\n### Caffe \u003ca name=\"caffe\"/\u003e\n\n| Model Name | Description | Framework |\n|   :---:      |     :---:      |     :---:     |\n| [Speech Recognition](https://github.com/pannous/caffe-speech-recognition)  | Speech Recognition with the caffe deep learning framework.     | `Caffe`\n\n\u003cdiv align=\"right\"\u003e\n    \u003cb\u003e\u003ca href=\"#framework\"\u003e↥ Back To Top\u003c/a\u003e\u003c/b\u003e\n\u003c/div\u003e\n\n***\n\n## Contributions\nYour contributions are always welcome!!\nPlease have a look at contributing.md\n\n## License\n\n[MIT License](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbalavenkatesh3322%2Faudio-pretrained-model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbalavenkatesh3322%2Faudio-pretrained-model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbalavenkatesh3322%2Faudio-pretrained-model/lists"}