{"id":21625622,"url":"https://github.com/eljandoubi/dnn-speech-recognizer","last_synced_at":"2026-05-19T05:39:45.644Z","repository":{"id":180172210,"uuid":"550842071","full_name":"eljandoubi/DNN-Speech-Recognizer","owner":"eljandoubi","description":"Speech Recognizer basic models.","archived":false,"fork":false,"pushed_at":"2022-10-13T12:23:42.000Z","size":25809,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-13T08:12:02.851Z","etag":null,"topics":["natural-language-processing","speech-recognition","tenserflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eljandoubi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-13T12:18:45.000Z","updated_at":"2022-10-13T12:25:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"ecc9c48e-44cf-4f8d-8acb-3ce9745122c2","html_url":"https://github.com/eljandoubi/DNN-Speech-Recognizer","commit_stats":null,"previous_names":["eljandoubi/dnn-speech-recognizer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eljandoubi/DNN-Speech-Recognizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eljandoubi%2FDNN-Speech-Recognizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eljandoubi%2FDNN-Speech-Recognizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eljandoubi%2FDNN-Speech-Recognizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eljandoubi%2FDNN-Speech-Recognizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eljandoubi","download_url":"https://codeload.github.com/eljandoubi/DNN-Speech-Recognizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eljandoubi%2FDNN-Speech-Recognizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279014292,"owners_count":26085489,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["natural-language-processing","speech-recognition","tenserflow"],"created_at":"2024-11-25T01:09:51.066Z","updated_at":"2025-10-13T08:12:03.241Z","avatar_url":"https://github.com/eljandoubi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"[//]: # (Image References)\n\n[image1]: ./images/pipeline.png \"ASR Pipeline\"\n[image2]: ./images/select_kernel.png \"select aind-vui kernel\"\n\n## Project Overview\n\nIn this notebook, we will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline!  \n\n![ASR Pipeline][image1]\n\nWe begin by investigating the [LibriSpeech dataset](http://www.openslr.org/12/) that will be used to train and evaluate your models. Your algorithm will first convert any raw audio to feature representations that are commonly used for ASR. You will then move on to building neural networks that can map these audio features to transcribed text. After learning about the basic types of layers that are often used for deep learning-based approaches to ASR, you will engage in your own investigations by creating and testing your own state-of-the-art models. Throughout the notebook, we provide recommended research papers for additional reading and links to GitHub repositories with interesting implementations.\n\n## Project Instructions\n\n### Amazon Web Services\n\nThis project requires GPU acceleration to run efficiently. Please refer to the Udacity instructions for setting up a GPU instance for this project, and refer to the project instructions in the classroom for setup. [link for AIND students](https://classroom.udacity.com/nanodegrees/nd889/parts/4550d1eb-a3e0-4e9b-9d3c-4f55aa6662b5/modules/c8419a1e-acd3-4463-9c01-a4c93f7c3b24/lessons/b27e9b6a-bb3b-4f3e-8993-bdfcb662a426/concepts/61c0743f-22f1-47db-a4d2-5616c25fc888)\n\n1. Follow the Cloud Computing Setup instructions lesson to create an EC2 instance. (The lesson includes all the required package and library installation instructions.)\n\n2. Obtain the appropriate subsets of the LibriSpeech dataset, and convert all flac files to wav format.\n```\nwget http://www.openslr.org/resources/12/dev-clean.tar.gz\ntar -xzvf dev-clean.tar.gz\nwget http://www.openslr.org/resources/12/test-clean.tar.gz\ntar -xzvf test-clean.tar.gz\nmv flac_to_wav.sh LibriSpeech\ncd LibriSpeech\n./flac_to_wav.sh\n```\n\n3. Create JSON files corresponding to the train and validation datasets.\n```\ncd ..\npython create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json\npython create_desc_json.py LibriSpeech/test-clean/ valid_corpus.json\n```\n\n4. Start Jupyter:\n```\njupyter notebook --ip=0.0.0.0 --no-browser\n```\n\n5. Look at the output in the window, and find the line that looks like: `http://0.0.0.0:8888/?token=3156e...` Copy and paste the **complete** URL into the address bar of a web browser (Firefox, Safari, Chrome, etc). Before navigating to the URL, replace 0.0.0.0 in the URL with the \"IPv4 Public IP\" address from the EC2 Dashboard.\n\n\n### Local Environment Setup\n\nYou should run this project with GPU acceleration for best performance.\n\n1. Clone the repository, and navigate to the downloaded folder.\n```\ngit clone https://github.com/eljandoubi/DNN-Speech-Recognizer.git\ncd DNN-Speech-Recognizer\n```\n\n2. Create (and activate) a new environment with Python 3.6 and the `numpy` package.\n\n\t- __Linux__ or __Mac__: \n\t```\n\tconda create --name aind-vui python=3.5 numpy\n\tsource activate aind-vui\n\t```\n\t- __Windows__: \n\t```\n\tconda create --name aind-vui python=3.5 numpy scipy\n\tactivate aind-vui\n\t```\n\n3. Install TensorFlow.\n\t- Option 1: __To install TensorFlow with GPU support__, follow [the guide](https://www.tensorflow.org/install/) to install the necessary NVIDIA software on your system.  If you are using an EC2 GPU instance, you can skip this step and only need to install the `tensorflow-gpu` package:\n\t```\n\tpip install tensorflow-gpu==1.1.0\n\t```\n\t- Option 2: __To install TensorFlow with CPU support only__,\n\t```\n\tpip install tensorflow==1.1.0\n\t```\n\n4. Install a few pip packages.\n```\npip install -r requirements.txt\n```\n\n5. Switch [Keras backend](https://keras.io/backend/) to TensorFlow.\n\t- __Linux__ or __Mac__: \n\t```\n\tKERAS_BACKEND=tensorflow python -c \"from keras import backend\"\n\t```\n\t- __Windows__: \n\t```\n\tset KERAS_BACKEND=tensorflow\n\tpython -c \"from keras import backend\"\n\t```\n\t- __NOTE:__ a Keras/Windows bug may give this error after the first epoch of training model 0: `‘rawunicodeescape’ codec can’t decode bytes in position 54-55: truncated \\uXXXX `. \nTo fix it: \n\t\t- Find the file `keras/utils/generic_utils.py` that you are using for the capstone project. It should be in your environment under `Lib/site-packages` . This may vary, but if using miniconda, for example, it might be located at `C:/Users/username/Miniconda3/envs/aind-vui/Lib/site-packages/keras/utils`.\n\t\t- Copy `generic_utils.py` to `OLDgeneric_utils.py` just in case you need to restore it.\n\t\t- Open the `generic_utils.py` file and change this code line:\u003c/br\u003e`marshal.dumps(func.code).decode(‘raw_unicode_escape’)`\u003c/br\u003eto this code line:\u003c/br\u003e`marshal.dumps(func.code).replace(b’\\’,b’/’).decode(‘raw_unicode_escape’)`\n\n6. Obtain the `libav` package.\n\t- __Linux__: `sudo apt-get install libav-tools`\n\t- __Mac__: `brew install libav`\n\t- __Windows__: Browse to the [Libav website](https://libav.org/download/)\n\t\t- Scroll down to \"Windows Nightly and Release Builds\" and click on the appropriate link for your system (32-bit or 64-bit).\n\t\t- Click `nightly-gpl`.\n\t\t- Download most recent archive file.\n\t\t- Extract the file.  Move the `usr` directory to your C: drive.\n\t\t- Go back to your terminal window from above.\n\t```\n\trename C:\\usr avconv\n    set PATH=C:\\avconv\\bin;%PATH%\n\t```\n\n7. Obtain the appropriate subsets of the LibriSpeech dataset, and convert all flac files to wav format.\n\t- __Linux__ or __Mac__: \n\t```\n\twget http://www.openslr.org/resources/12/dev-clean.tar.gz\n\ttar -xzvf dev-clean.tar.gz\n\twget http://www.openslr.org/resources/12/test-clean.tar.gz\n\ttar -xzvf test-clean.tar.gz\n\tmv flac_to_wav.sh LibriSpeech\n\tcd LibriSpeech\n\t./flac_to_wav.sh\n\t```\n\t- __Windows__: Download two files ([file 1](http://www.openslr.org/resources/12/dev-clean.tar.gz) and [file 2](http://www.openslr.org/resources/12/test-clean.tar.gz)) via browser and save in the `AIND-VUI-Capstone` directory.  Extract them with an application that is compatible with `tar` and `gz` such as [7-zip](http://www.7-zip.org/) or [WinZip](http://www.winzip.com/). Convert the files from your terminal window.\n\t```\n\tmove flac_to_wav.sh LibriSpeech\n\tcd LibriSpeech\n\tpowershell ./flac_to_wav.sh\n\t```\n\n8. Create JSON files corresponding to the train and validation datasets.\n```\ncd ..\npython create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json\npython create_desc_json.py LibriSpeech/test-clean/ valid_corpus.json\n```\n\n9. Create an [IPython kernel](http://ipython.readthedocs.io/en/stable/install/kernel_install.html) for the `aind-vui` environment.  Open the notebook.\n```\npython -m ipykernel install --user --name aind-vui --display-name \"aind-vui\"\njupyter notebook vui_notebook.ipynb\n```\n\n10. Before running code, change the kernel to match the `aind-vui` environment by using the drop-down menu.  Then, follow the instructions in the notebook.\n\n![select aind-vui kernel][image2]\n\n# Training and inference\n  run :\n\n  - `vui_notebook.ipynb`\n\nModels are saved in `/result`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feljandoubi%2Fdnn-speech-recognizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feljandoubi%2Fdnn-speech-recognizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feljandoubi%2Fdnn-speech-recognizer/lists"}