{"id":19783857,"url":"https://github.com/luluw8071/deep-speech-2","last_synced_at":"2026-06-08T13:32:04.037Z","repository":{"id":261765843,"uuid":"869297526","full_name":"LuluW8071/Deep-Speech-2","owner":"LuluW8071","description":"Implementation of Deep Speech 2 paper with BiGRU and BiLSTM using LibriSpeech Dataset","archived":false,"fork":false,"pushed_at":"2025-01-27T11:57:23.000Z","size":2186,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-28T13:13:15.820Z","etag":null,"topics":["asr","ctc-decode","deep-speech","hacktoberfest","kenlm-toolkit","librispeech"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1512.02595","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LuluW8071.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-08T04:07:32.000Z","updated_at":"2025-02-03T19:29:47.000Z","dependencies_parsed_at":"2024-11-08T10:34:53.279Z","dependency_job_id":"ed6dfbad-aa0e-4e65-88ba-80533ceaac8f","html_url":"https://github.com/LuluW8071/Deep-Speech-2","commit_stats":null,"previous_names":["luluw8071/deep-speech-2"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LuluW8071/Deep-Speech-2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuluW8071%2FDeep-Speech-2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuluW8071%2FDeep-Speech-2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuluW8071%2FDeep-Speech-2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuluW8071%2FDeep-Speech-2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LuluW8071","download_url":"https://codeload.github.com/LuluW8071/Deep-Speech-2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuluW8071%2FDeep-Speech-2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34065349,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","ctc-decode","deep-speech","hacktoberfest","kenlm-toolkit","librispeech"],"created_at":"2024-11-12T06:09:27.982Z","updated_at":"2026-06-08T13:32:04.020Z","avatar_url":"https://github.com/LuluW8071.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep Speech 2\n\n\u003cdiv align=\"center\"\u003e\n\n![Status](https://img.shields.io/badge/status-completed-green.svg) ![License](https://img.shields.io/github/license/LuluW8071/Deep-Speech-2) ![Open Issues](https://img.shields.io/github/issues/LuluW8071/Deep-Speech-2) ![Closed Issues](https://img.shields.io/github/issues-closed/LuluW8071/Deep-Speech-2) ![Open PRs](https://img.shields.io/github/issues-pr/LuluW8071/Deep-Speech-2) ![Repo Size](https://img.shields.io/github/repo-size/LuluW8071/Deep-Speech-2) ![Last Commit](https://img.shields.io/github/last-commit/LuluW8071/Deep-Speech-2)\n\n\u003c/div\u003e\n\nThis repository contains an implementation of the paper **Deep Speech 2: End-to-End Speech Recognition**, a state-of-the-art ASR model designed for end-to-end speech-to-text transcription using deep learning techniques. The implementation leverages **Lightning AI ⚡** for efficient training and experimentation.\n\n---\n\n## 📜 Paper \u0026 Blog Reviews\n\n- ✅ [Gated Recurrent Neural Networks](https://arxiv.org/pdf/1412.3555)\n- ✅ [Deep Speech 2: End-to-End Speech Recognition](https://arxiv.org/abs/1512.02595)\n- ✅ [KenLM](https://kheafield.com/code/kenlm/)\n- ✅ [Boosting Sequence Generation Performance with Beam Search Language Model Decoding](https://towardsdatascience.com/boosting-your-sequence-generation-performance-with-beam-search-language-model-decoding-74ee64de435a)\n\n---\n\n## 🚀 Installation\n\n1. **Clone the repository:**\n   ```bash\n   git clone https://github.com/LuluW8071/Deep-Speech-2.git\n   cd Deep-Speech-2\n   ```\n\n2. **Install dependencies:**\n   ```bash\n   pip install -r requirements.txt\n   ```\n   Ensure you have `PyTorch` and `Lightning AI` installed.\n\n---\n\n## 📖 Usage\n\n### 🔥 Training\n\n\u003e **Important:** Before training, make sure to set your **Comet ML API key** and **project name** in the `.env` file.\n\nTo train the **Deep Speech 2** model with default configurations:\n```bash\npython3 train.py\n```\n\nTo customize the training parameters, modify `train.py` or pass arguments:\n\n| Argument | Description | Default |\n|----------|-------------|---------|\n| `-g`, `--gpus` | Number of GPUs per node | `1` |\n| `-w`, `--num_workers` | Number of data loading workers | `4` |\n| `-db`, `--dist_backend` | Distributed backend | `'ddp_find_unused_parameters_true'` |\n| `-m`, `--model_type` | Type of RNN (`lstm` or `gru`) | `'lstm'` |\n| `-cl`, `--resnet_layers` | Number of residual CNN layers | `2` |\n| `-nl`, `--rnn_layers` | Number of RNN layers | `3` |\n| `-rd`, `--rnn_dim` | RNN hidden size | `512` |\n| `--epochs` | Number of training epochs | `50` |\n| `--batch_size` | Batch size | `32` |\n| `-gc`, `--grad_clip` | Gradient clipping | `0.6` |\n| `-lr`, `--learning_rate` | Learning rate | `2e-4` |\n| `--precision` | Precision mode | `'16-mixed'` |\n| `--checkpoint_path` | Path to checkpoint file | `None` |\n\n---\n\n### 🧊 Export TorchScript Model\n\n```bash\npython3 freeze.py --model_checkpoint saved_checkpoint/deepspeech2.ckpt\n```\n\n### 🎙️ Inference\n\nTo perform inference using a trained model:\n```bash\npython3 demo.py --model_path optimized_model.pt --share\n```\n\n---\n\n## 📊 Experiment Results\n\nThe model was trained on **LibriSpeech train set** (100 + 360 + 500 hours) and validated on the **LibriSpeech test set** (~10.5 hours) using **16-bit mixed precision**.\n\n🔗 **Download Checkpoint**: [Google Drive Link](https://drive.google.com/file/d/14J6HhN_Op4c0y-up096eY_6_6D5JLIHb/view?usp=sharing)\n\n### Model Performance\n\n| Model Type | ResCNN Layers | RNN Layers | RNN Dim | Epochs | Batch Size | Grad Clip | LR |\n|------------|---------------|------------|---------|--------|------------|-----------|----|\n| BiLSTM     | 2             | 3          | 512     | 25     | 64         | 0.6       | 2e-4 |\n\n#### 📉 Loss Curves\n![Loss Curves](assets/loss_curves.png)\n\n#### 📝 WER \u0026 CER Metrics (Greedy Decoding)\n![Greedy Metrics](assets/greedy_metrics.png)\n\n#### 🔍 Beam Search Decoding\n| Word Score | LM Weight | N-gram LM | Beam Size | Beam Threshold |\n|------------|-----------|-----------|-----------|----------------|\n| -0.26       | 0.3       | 4-gram    | 25        | 10             |\n\n![Beam Search Metrics](assets/beam_search_metrics.png)\n\n#### 🔎 Alignments Visualization\n![Alignments](assets/plot_alignments.png)\n\n---\n\n## 🔗 Citations\n\n```bibtex\n@misc{amodei2015deepspeech2endtoend,\n      title={Deep Speech 2: End-to-End Speech Recognition in English and Mandarin},\n      author={Dario Amodei and Rishita Anubhai and Eric Battenberg and Carl Case and others},\n      year={2015},\n      url={https://arxiv.org/abs/1512.02595}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluluw8071%2Fdeep-speech-2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluluw8071%2Fdeep-speech-2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluluw8071%2Fdeep-speech-2/lists"}