{"id":19730499,"url":"https://github.com/tensorspeech/tensorflowasr","last_synced_at":"2025-05-14T06:07:43.942Z","repository":{"id":37579609,"uuid":"240232771","full_name":"TensorSpeech/TensorFlowASR","owner":"TensorSpeech","description":":zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords","archived":false,"fork":false,"pushed_at":"2025-04-10T09:29:48.000Z","size":94078,"stargazers_count":965,"open_issues_count":49,"forks_count":244,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-04-11T00:53:08.291Z","etag":null,"topics":["automatic-speech-recognition","conformer","contextnet","ctc","deepspeech2","end2end","jasper","rnn-transducer","speech-recognition","speech-to-text","streaming-transducer","subword-speech-recognition","tensorflow","tensorflow2","tflite","tflite-convertion","tflite-model"],"latest_commit_sha":null,"homepage":"https://huylenguyen.com/asr","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TensorSpeech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-13T10:23:41.000Z","updated_at":"2025-04-10T09:29:53.000Z","dependencies_parsed_at":"2023-10-21T14:36:52.778Z","dependency_job_id":"3b8617c9-aa21-439b-a1b9-c53d61bb1a33","html_url":"https://github.com/TensorSpeech/TensorFlowASR","commit_stats":{"total_commits":993,"total_committers":12,"mean_commits":82.75,"dds":0.07351460221550854,"last_synced_commit":"0e13f76132d2bd09cd18e26a763df11ebb000f89"},"previous_names":["usimarit/tiramisuasr"],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowASR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowASR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowASR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowASR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TensorSpeech","download_url":"https://codeload.github.com/TensorSpeech/TensorFlowASR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248322609,"owners_count":21084336,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automatic-speech-recognition","conformer","contextnet","ctc","deepspeech2","end2end","jasper","rnn-transducer","speech-recognition","speech-to-text","streaming-transducer","subword-speech-recognition","tensorflow","tensorflow2","tflite","tflite-convertion","tflite-model"],"created_at":"2024-11-12T00:16:33.888Z","updated_at":"2025-04-11T00:53:16.254Z","avatar_url":"https://github.com/TensorSpeech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\nTensorFlowASR :zap:\n\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/TensorSpeech/TensorFlowASR/blob/main/LICENSE\"\u003e\n  \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/TensorSpeech/TensorFlowASR?logo=apache\u0026logoColor=green\"\u003e\n\u003c/a\u003e\n\u003cimg alt=\"python\" src=\"https://img.shields.io/badge/python-%3E%3D3.8-blue?logo=python\"\u003e\n\u003cimg alt=\"tensorflow\" src=\"https://img.shields.io/badge/tensorflow-%3E%3D2.12.0-orange?logo=tensorflow\"\u003e\n\u003ca href=\"https://pypi.org/project/TensorFlowASR/\"\u003e\n  \u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/TensorFlowASR?color=%234285F4\u0026label=release\u0026logo=pypi\u0026logoColor=%234285F4\"\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\u003ch2 align=\"center\"\u003e\nAlmost State-of-the-art Automatic Speech Recognition in Tensorflow 2\n\u003c/h2\u003e\n\n\u003cp align=\"center\"\u003e\nTensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment :smile:\n\u003c/p\u003e\n\n## What's New?\n\n## Table of Contents\n\n\u003c!-- TOC --\u003e\n\n- [What's New?](#whats-new)\n- [Table of Contents](#table-of-contents)\n- [:yum: Supported Models](#yum-supported-models)\n  - [Baselines](#baselines)\n  - [Publications](#publications)\n- [Installation](#installation)\n  - [Installing from source (recommended)](#installing-from-source-recommended)\n  - [Installing via PyPi](#installing-via-pypi)\n  - [Installing for development](#installing-for-development)\n  - [Install for Apple Sillicon](#install-for-apple-sillicon)\n  - [Running in a container](#running-in-a-container)\n- [Training \\\u0026 Testing Tutorial](#training--testing-tutorial)\n- [Features Extraction](#features-extraction)\n- [Augmentations](#augmentations)\n- [TFLite Convertion](#tflite-convertion)\n- [Pretrained Models](#pretrained-models)\n- [Corpus Sources](#corpus-sources)\n  - [English](#english)\n  - [Vietnamese](#vietnamese)\n- [How to contribute](#how-to-contribute)\n- [References \\\u0026 Credits](#references--credits)\n- [Contact](#contact)\n\n\u003c!-- /TOC --\u003e\n\n## :yum: Supported Models\n\n### Baselines\n\n- **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)\n- **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)\n\n### Publications\n\n- **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100))\n  See [examples/models/transducer/conformer](./examples/models/transducer/conformer)\n- **ContextNet** (Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191))\n  See [examples/models/transducer/contextnet](./examples/models/transducer/contextnet)\n- **RNN Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621))\n  See [examples/models/transducer/rnnt](./examples/models/transducer/rnnt)\n- **Deep Speech 2** (Reference: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595))\n  See [examples/models/ctc/deepspeech2](./examples/models/ctc/deepspeech2)\n- **Jasper** (Reference: [https://arxiv.org/abs/1904.03288](https://arxiv.org/abs/1904.03288))\n  See [examples/models/ctc/jasper](./examples/models/ctc/jasper)\n\n## Installation\n\nFor training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.)\n\n### Installing from source (recommended)\n\n```bash\ngit clone https://github.com/TensorSpeech/TensorFlowASR.git\ncd TensorFlowASR\n# Tensorflow 2.x (with 2.x.x \u003e= 2.5.1)\npip3 install \".[tf2.x]\" # or \".[tf2.x-gpu]\"\n```\n\nFor anaconda3:\n\n```bash\nconda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow\nconda activate tfasr\npip install -U tensorflow-gpu # upgrade to latest version of tensorflow\ngit clone https://github.com/TensorSpeech/TensorFlowASR.git\ncd TensorFlowASR\n# Tensorflow 2.x (with 2.x.x \u003e= 2.5.1)\npip3 install \".[tf2.x]\" # or \".[tf2.x-gpu]\"\n```\n\n### Installing via PyPi\n\n```bash\n# Tensorflow 2.x (with 2.x \u003e= 2.3)\npip3 install \"TensorFlowASR[tf2.x]\" # or pip3 install \"TensorFlowASR[tf2.x-gpu]\"\n```\n\n### Installing for development\n\n```bash\ngit clone https://github.com/TensorSpeech/TensorFlowASR.git\ncd TensorFlowASR\npip3 install -e \".[dev]\"\npip3 install -e \".[tf2.x]\" # or \".[tf2.x-gpu]\" or \".[tf2.x-apple]\" for apple m1 machine\n```\n\n### Install for Apple Sillicon\n\nDue to tensorflow-text is not built for Apple Sillicon, we need to install it with the prebuilt wheel file from [sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon](https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon)\n\n```bash\ngit clone https://github.com/TensorSpeech/TensorFlowASR.git\ncd TensorFlowASR\npip3 install -e \".\" # or pip3 install -e \".[dev] for development # or pip3 install \"TensorFlowASR[dev]\" from PyPi\npip3 install tensorflow~=2.14.0 # change minor version if you want\n```\n\nDo this after installing TensorFlowASR with tensorflow above\n\n```bash\nTF_VERSION=\"$(python3 -c 'import tensorflow; print(tensorflow.__version__)')\" \u0026\u0026 \\\nTF_VERSION_MAJOR=\"$(echo $TF_VERSION | cut -d'.' -f1,2)\" \u0026\u0026 \\\nPY_VERSION=\"$(python3 -c 'import platform; major, minor, patch = platform.python_version_tuple(); print(f\"{major}{minor}\");')\" \u0026\u0026 \\\nURL=\"https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon\" \u0026\u0026 \\\npip3 install \"${URL}/releases/download/v${TF_VERSION_MAJOR}/tensorflow_text-${TF_VERSION_MAJOR}.0-cp${PY_VERSION}-cp${PY_VERSION}-macosx_11_0_arm64.whl\"\n```\n\n### Running in a container\n\n```bash\ndocker-compose up -d\n```\n\n\n\n## Training \u0026 Testing Tutorial\n\n- For training, please read [tutorial_training](./docs/tutorials/training.md)\n- For testing, please read [tutorial_testing](./docs/tutorials/testing.md)\n\n**FYI**: Keras builtin training uses **infinite dataset**, which avoids the potential last partial batch.\n\nSee [examples](./examples/) for some predefined ASR models and results\n\n## Features Extraction\n\nSee [features_extraction](./tensorflow_asr/features/README.md)\n\n## Augmentations\n\nSee [augmentations](./tensorflow_asr/augmentations/README.md)\n\n## TFLite Convertion\n\nAfter converting to tflite, the tflite model is like a function that transforms directly from an **audio signal** to **text and tokens**\n\nSee [tflite_convertion](./docs/tutorials/tflite.md)\n\n## Pretrained Models\n\nGo to [drive](https://drive.google.com/drive/folders/1BD0AK30n8hc-yR28C5FW3LqzZxtLOQfl?usp=sharing)\n\n## Corpus Sources\n\n### English\n\n| **Name**     | **Source**                                                         | **Hours** |\n| :----------- | :----------------------------------------------------------------- | :-------- |\n| LibriSpeech  | [LibriSpeech](http://www.openslr.org/12)                           | 970h      |\n| Common Voice | [https://commonvoice.mozilla.org](https://commonvoice.mozilla.org) | 1932h     |\n\n### Vietnamese\n\n| **Name**                               | **Source**                                                                                                           | **Hours** |\n| :------------------------------------- | :------------------------------------------------------------------------------------------------------------------- | :-------- |\n| Vivos                                  | [https://ailab.hcmus.edu.vn/vivos](https://www.kaggle.com/datasets/kynthesis/vivos-vietnamese-speech-corpus-for-asr) | 15h       |\n| InfoRe Technology 1                    | [InfoRe1 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/datasets/infore/25hours.zip)                  | 25h       |\n| InfoRe Technology 2 (used in VLSP2019) | [InfoRe2 (passwd: BroughtToYouByInfoRe)](https://files.huylenguyen.com/datasets/infore/audiobooks.zip)               | 415h      |\n\n## How to contribute\n\n1. Fork the project\n2. [Install for development](#installing-for-development)\n3. Create a branch\n4. Make a pull request to this repo\n\n## References \u0026 Credits\n\n1. [NVIDIA OpenSeq2Seq Toolkit](https://github.com/NVIDIA/OpenSeq2Seq)\n2. [https://github.com/noahchalifour/warp-transducer](https://github.com/noahchalifour/warp-transducer)\n3. [Sequence Transduction with Recurrent Neural Network](https://arxiv.org/abs/1211.3711)\n4. [End-to-End Speech Processing Toolkit in PyTorch](https://github.com/espnet/espnet)\n5. [https://github.com/iankur/ContextNet](https://github.com/iankur/ContextNet)\n\n## Contact\n\nHuy Le Nguyen\n\nEmail: nlhuy.cs.16@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorspeech%2Ftensorflowasr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftensorspeech%2Ftensorflowasr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorspeech%2Ftensorflowasr/lists"}