{"id":17352513,"url":"https://github.com/lucko515/speech-recognition-neural-network","last_synced_at":"2025-08-16T23:10:02.940Z","repository":{"id":104742862,"uuid":"100402905","full_name":"lucko515/speech-recognition-neural-network","owner":"lucko515","description":"This is the end-to-end Speech Recognition neural network, deployed in Keras. This was my final project for Artificial Intelligence Nanodegree @Udacity.","archived":false,"fork":false,"pushed_at":"2017-08-15T17:41:31.000Z","size":13270,"stargazers_count":190,"open_issues_count":7,"forks_count":85,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-07-19T11:28:09.263Z","etag":null,"topics":["aind","deep-learning","gru","lstm-neural-networks","recurrent-neural-networks","speech-recognition"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucko515.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-15T17:38:43.000Z","updated_at":"2025-03-28T04:07:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"568dfa10-3f6d-40ea-8cc4-6c7dfc1029de","html_url":"https://github.com/lucko515/speech-recognition-neural-network","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lucko515/speech-recognition-neural-network","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucko515%2Fspeech-recognition-neural-network","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucko515%2Fspeech-recognition-neural-network/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucko515%2Fspeech-recognition-neural-network/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucko515%2Fspeech-recognition-neural-network/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucko515","download_url":"https://codeload.github.com/lucko515/speech-recognition-neural-network/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucko515%2Fspeech-recognition-neural-network/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270781393,"owners_count":24643820,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aind","deep-learning","gru","lstm-neural-networks","recurrent-neural-networks","speech-recognition"],"created_at":"2024-10-15T17:13:57.401Z","updated_at":"2025-08-16T23:10:02.901Z","avatar_url":"https://github.com/lucko515.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"[//]: # (Image References)\n\n[image1]: ./images/pipeline.png \"ASR Pipeline\"\n[image2]: ./images/select_kernel.png \"select aind-vui kernel\"\n\n## Project Overview\n\nIn this notebook, you will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline!  \n\n![ASR Pipeline][image1]\n\nWe begin by investigating the [LibriSpeech dataset](http://www.openslr.org/12/) that will be used to train and evaluate your models. Your algorithm will first convert any raw audio to feature representations that are commonly used for ASR. You will then move on to building neural networks that can map these audio features to transcribed text. After learning about the basic types of layers that are often used for deep learning-based approaches to ASR, you will engage in your own investigations by creating and testing your own state-of-the-art models. Throughout the notebook, we provide recommended research papers for additional reading and links to GitHub repositories with interesting implementations.\n\n## Project Instructions\n\n### Getting Started\n\n1. Clone the repository, and navigate to the downloaded folder.\n```\ngit clone https://github.com/udacity/AIND-VUI-Capstone.git\ncd AIND-VUI-Capstone\n```\n\n2. Create (and activate) a new environment with Python 3.6 and the `numpy` package.\n\n\t- __Linux__ or __Mac__: \n\t```\n\tconda create --name aind-vui python=3.5 numpy\n\tsource activate aind-vui\n\t```\n\t- __Windows__: \n\t```\n\tconda create --name aind-vui python=3.5 numpy scipy\n\tactivate aind-vui\n\t```\n\n3. Install TensorFlow.\n\t- Option 1: __To install TensorFlow with GPU support__, follow [the guide](https://www.tensorflow.org/install/) to install the necessary NVIDIA software on your system.  If you are using the Udacity AMI, you can skip this step and only need to install the `tensorflow-gpu` package:\n\t```\n\tpip install tensorflow-gpu==1.1.0\n\t```\n\t- Option 2: __To install TensorFlow with CPU support only__,\n\t```\n\tpip install tensorflow==1.1.0\n\t```\n\n4. Install a few pip packages.\n```\npip install -r requirements.txt\n```\n\n5. Switch [Keras backend](https://keras.io/backend/) to TensorFlow.\n\t- __Linux__ or __Mac__: \n\t```\n\tKERAS_BACKEND=tensorflow python -c \"from keras import backend\"\n\t```\n\t- __Windows__: \n\t```\n\tset KERAS_BACKEND=tensorflow\n\tpython -c \"from keras import backend\"\n\t```\n\n6. Obtain the `libav` package.\n\t- __Linux__: `sudo apt-get install libav-tools`\n\t- __Mac__: `brew install libav`\n\t- __Windows__: Browse to the [Libav website](https://libav.org/download/)\n\t\t- Scroll down to \"Windows Nightly and Release Builds\" and click on the appropriate link for your system (32-bit or 64-bit).\n\t\t- Click `nightly-gpl`.\n\t\t- Download most recent archive file.\n\t\t- Extract the file.  Move the `usr` directory to your C: drive.\n\t\t- Go back to your terminal window from above.\n\t```\n\trename C:\\usr avconv\n    set PATH=C:\\avconv\\bin;%PATH%\n\t```\n\n7. Obtain the appropriate subsets of the LibriSpeech dataset, and convert all flac files to wav format.\n\t- __Linux__ or __Mac__: \n\t```\n\twget http://www.openslr.org/resources/12/dev-clean.tar.gz\n\ttar -xzvf dev-clean.tar.gz\n\twget http://www.openslr.org/resources/12/test-clean.tar.gz\n\ttar -xzvf test-clean.tar.gz\n\tmv flac_to_wav.sh LibriSpeech\n\tcd LibriSpeech\n\t./flac_to_wav.sh\n\t```\n\t- __Windows__: Download two files ([file 1](http://www.openslr.org/resources/12/dev-clean.tar.gz) and [file 2](http://www.openslr.org/resources/12/test-clean.tar.gz)) via browser and save in the `AIND-VUI-Capstone` directory.  Extract them with an application that is compatible with `tar` and `gz` such as [7-zip](http://www.7-zip.org/) or [WinZip](http://www.winzip.com/). Convert the files from your terminal window.\n\t```\n\tmove flac_to_wav.sh LibriSpeech\n\tcd LibriSpeech\n\tpowershell ./flac_to_wav.sh\n\t```\n\n8. Create JSON files corresponding to the train and validation datasets.\n```\ncd ..\npython create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json\npython create_desc_json.py LibriSpeech/test-clean/ valid_corpus.json\n```\n\n9. Create an [IPython kernel](http://ipython.readthedocs.io/en/stable/install/kernel_install.html) for the `aind-vui` environment.  Open the notebook.\n```\npython -m ipykernel install --user --name aind-vui --display-name \"aind-vui\"\njupyter notebook vui_notebook.ipynb\n```\n\n10. Before running code, change the kernel to match the `aind-vui` environment by using the drop-down menu.  Then, follow the instructions in the notebook.\n\n![select aind-vui kernel][image2]\n\n__NOTE:__ While some code has already been implemented to get you started, you will need to implement additional functionality to successfully answer all of the questions included in the notebook. __Unless requested, do not modify code that has already been included.__\n\n\n### Amazon Web Services\n\nIf you do not have access to a local GPU, you could use Amazon Web Services to launch an EC2 GPU instance.  Please refer to the [Udacity instructions](https://classroom.udacity.com/nanodegrees/nd889/parts/16cf5df5-73f0-4afa-93a9-de5974257236/modules/53b2a19e-4e29-4ae7-aaf2-33d195dbdeba/lessons/2df3b94c-4f09-476a-8397-e8841b147f84/project) for setting up a GPU instance for this project.\n\n\n### Evaluation\n\nYour project will be reviewed by a Udacity reviewer against the CNN project [rubric](#rubric).  Review this rubric thoroughly, and self-evaluate your project before submission.  All criteria found in the rubric must meet specifications for you to pass.\n\n\n### Project Submission\n\nWhen you are ready to submit your project, collect the following files and compress them into a single archive for upload:\n- The `vui_notebook.ipynb` file with fully functional code, all code cells executed and displaying output, and all questions answered.\n- An HTML or PDF export of the project notebook with the name `report.html` or `report.pdf`.\n- The `sample_models.py` file with all model architectures that were trained in the project Jupyter notebook.\n- The `results/` folder containing all HDF5 and pickle files corresponding to trained models.\n\nAlternatively, your submission could consist of the GitHub link to your repository.\n\n\n\u003ca id='rubric'\u003e\u003c/a\u003e\n## Project Rubric\n\n#### Files Submitted\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Submission Files      | The submission includes all required files.\t\t|\n\n#### STEP 2: Model 0: RNN\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Trained Model 0         \t\t| The submission trained the model for at least 20 epochs, and none of the loss values in `model_0.pickle` are undefined.  The trained weights for the model specified in `simple_rnn_model` are stored in `model_0.h5`.   \t|\n\n#### STEP 2: Model 1: RNN + TimeDistributed Dense\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Completed `rnn_model` Module         \t\t| The submission includes a `sample_models.py` file with a completed `rnn_model` module containing the correct architecture.   \t|\n| Trained Model 1         \t\t| The submission trained the model for at least 20 epochs, and none of the loss values in `model_1.pickle` are undefined.  The trained weights for the model specified in `rnn_model` are stored in `model_1.h5`.   \t|\n\n#### STEP 2: Model 2: CNN + RNN + TimeDistributed Dense\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Completed `cnn_rnn_model` Module         \t\t| The submission includes a `sample_models.py` file with a completed `cnn_rnn_model` module containing the correct architecture.   \t|\n| Trained Model 2         \t\t| The submission trained the model for at least 20 epochs, and none of the loss values in `model_2.pickle` are undefined.  The trained weights for the model specified in `cnn_rnn_model` are stored in `model_2.h5`.   \t|\n\n#### STEP 2: Model 3: Deeper RNN + TimeDistributed Dense\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Completed `deep_rnn_model` Module         \t\t| The submission includes a `sample_models.py` file with a completed `deep_rnn_model` module containing the correct architecture.   \t|\n| Trained Model 3         \t\t| The submission trained the model for at least 20 epochs, and none of the loss values in `model_3.pickle` are undefined.  The trained weights for the model specified in `deep_rnn_model` are stored in `model_3.h5`.   \t|\n\n#### STEP 2: Model 4: Bidirectional RNN + TimeDistributed Dense\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Completed `bidirectional_rnn_model` Module         \t\t| The submission includes a `sample_models.py` file with a completed `bidirectional_rnn_model` module containing the correct architecture.   \t|\n| Trained Model 4         \t\t| The submission trained the model for at least 20 epochs, and none of the loss values in `model_4.pickle` are undefined.  The trained weights for the model specified in `bidirectional_rnn_model` are stored in `model_4.h5`.   \t|\n\n#### STEP 2: Compare the Models\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Question 1         \t\t| The submission includes a detailed analysis of why different models might perform better than others.   \t|\n\n#### STEP 2: Final Model\n\n| Criteria       \t\t|     Meets Specifications\t        \t\t\t            | \n|:---------------------:|:---------------------------------------------------------:| \n| Completed `final_model` Module         \t\t| The submission includes a `sample_models.py` file with a completed `final_model` module containing a final architecture that is not identical to any of the previous architectures.   \t|\n| Trained Final Model        \t\t| The submission trained the model for at least 20 epochs, and none of the loss values in `model_end.pickle` are undefined.  The trained weights for the model specified in `final_model` are stored in `model_end.h5`.   \t|\n| Question 2         \t\t| The submission includes a detailed description of how the final model architecture was designed.   \t|\n\n\n## Suggestions to Make your Project Stand Out!\n\n#### (1) Add a Language Model to the Decoder\n\nThe performance of the decoding step can be greatly enhanced by incorporating a language model.  Build your own language model from scratch, or leverage a repository or toolkit that you find online to improve your predictions.\n\n#### (2) Train on Bigger Data\n\nIn the project, you used some of the smaller downloads from the LibriSpeech corpus.  Try training your model on some larger datasets - instead of using `dev-clean.tar.gz`, download one of the larger training sets on the [website](http://www.openslr.org/12/).\n\n#### (3) Try out Different Audio Features\n\nIn this project, you had the choice to use _either_ spectrogram or MFCC features.  Take the time to test the performance of _both_ of these features.  For a special challenge, train a network that uses raw audio waveforms!\n\n## Special Thanks\n\nWe have borrowed the `create_desc_json.py` and `flac_to_wav.sh` files from the [ba-dls-deepspeech](https://github.com/baidu-research/ba-dls-deepspeech) repository, along with some functions used to generate spectrograms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucko515%2Fspeech-recognition-neural-network","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucko515%2Fspeech-recognition-neural-network","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucko515%2Fspeech-recognition-neural-network/lists"}