{"id":26369868,"url":"https://github.com/msparihar/deep-speech-2","last_synced_at":"2026-01-03T16:30:35.080Z","repository":{"id":206424165,"uuid":"716610864","full_name":"Msparihar/deep-speech-2","owner":"Msparihar","description":"This repository contains the code and training materials for a speech-to-text model based on the Deep Speech 2 paper. The model is trained on a dataset of audio and text recordings, and can be used to transcribe speech to text in real time.","archived":false,"fork":false,"pushed_at":"2023-11-15T17:56:08.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-11-15T18:43:01.675Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Msparihar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-11-09T13:55:50.000Z","updated_at":"2023-11-13T17:53:07.000Z","dependencies_parsed_at":"2023-11-13T18:52:01.679Z","dependency_job_id":null,"html_url":"https://github.com/Msparihar/deep-speech-2","commit_stats":null,"previous_names":["msparihar/deep-speech-2"],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Msparihar%2Fdeep-speech-2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Msparihar%2Fdeep-speech-2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Msparihar%2Fdeep-speech-2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Msparihar%2Fdeep-speech-2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Msparihar","download_url":"https://codeload.github.com/Msparihar/deep-speech-2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243945590,"owners_count":20372897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-16T23:17:00.269Z","updated_at":"2026-01-03T16:30:35.032Z","avatar_url":"https://github.com/Msparihar.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep-Speech-2\n\nThis repository contains the code and training materials for a speech-to-text model based on the Deep Speech 2 paper. The model is trained on a dataset of audio and text recordings and can be used to transcribe speech to text in real-time.\n\n## Speech-to-Text Model\n\nThis repository contains the code and training materials for a speech-to-text model based on the Deep Speech 2 paper. The model is trained on a dataset of audio and text recordings and can be used to transcribe speech to text in real-time.\n\n### Requirements\n\n- Python ==  3.10\n- TensorFlow \u003c2.11 \u003c!-- Note: TensorFlow 2.11 wouldn't work with GPU. 🤷‍♀️ -- \u003e\n- NumPy\n- Pandas\n- Matplotlib\n\n### Setup\n\n#### Step-1\n\n`conda create -n stt python=3.10`\n\n#### Step-2\n\n`conda activate stt`\n\n#### Step-3\n\n`conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0`\n\n#### Step-4\n\n`python -m pip install \"tensorflow\u003c2.11\"`\n\n#### Step-5: Check GPU availability\n\n`python -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"`\n\n### Installation\n\nTo install the required dependencies, run the following command:\n\n`pip install -r requirements.txt`\n\n### Training the Model\n\nTo train the model, run the following command:\n\n`python train.py`\n\nThis will train the model on the default dataset, which is located in the data directory. You can also specify your own dataset by passing the path to the dataset directory as an argument to the --train-dir flag.\n\n### Using the Model\nOnce the model is trained, you can use it to transcribe speech to text by running the following command:\n\n`python transcribe.py`\n\nThis will prompt you to record an audio clip. Once you have recorded the audio clip, the model will transcribe it to text and print the transcription to the console.\n\nDeployment\nThe trained model can be deployed to production using a variety of methods, such as TensorFlow Serving or Docker.\n\nLicensing\nThis repository is licensed under the MIT License.\n\n### Checkpoints\n\nImportant Details to be filled in:\n\n- The name of your speech-to-text model: audio-wizard.\n\n- The dataset you used to train the model.[link](https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2)\n\n- The performance of the model on a held-out test set.\n\n- Instructions for using the model.\n\n- A link to the Deep Speech 2 paper: https://arxiv.org/abs/1512.02595.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsparihar%2Fdeep-speech-2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsparihar%2Fdeep-speech-2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsparihar%2Fdeep-speech-2/lists"}