{"id":13856888,"url":"https://github.com/at16k/at16k","last_synced_at":"2025-07-13T19:33:05.173Z","repository":{"id":57412315,"uuid":"225605078","full_name":"at16k/at16k","owner":"at16k","description":"Trained models for automatic speech recognition (ASR). A library to quickly build applications that require speech to text conversion.","archived":false,"fork":false,"pushed_at":"2021-03-31T13:02:33.000Z","size":274,"stargazers_count":129,"open_issues_count":5,"forks_count":18,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-06-11T15:49:24.978Z","etag":null,"topics":["asr","asr-model","automatic-speech-recognition","pretrained-models","speech-analysis","speech-api","speech-recognition","speech-recognizer","speech-to-text","voice-commands","voice-recognition"],"latest_commit_sha":null,"homepage":"https://at16k.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/at16k.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-03T11:37:10.000Z","updated_at":"2024-12-28T00:45:09.000Z","dependencies_parsed_at":"2022-08-27T23:51:42.912Z","dependency_job_id":null,"html_url":"https://github.com/at16k/at16k","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/at16k/at16k","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/at16k%2Fat16k","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/at16k%2Fat16k/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/at16k%2Fat16k/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/at16k%2Fat16k/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/at16k","download_url":"https://codeload.github.com/at16k/at16k/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/at16k%2Fat16k/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265129436,"owners_count":23715654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","asr-model","automatic-speech-recognition","pretrained-models","speech-analysis","speech-api","speech-recognition","speech-recognizer","speech-to-text","voice-commands","voice-recognition"],"created_at":"2024-08-05T03:01:17.453Z","updated_at":"2025-07-13T19:33:04.775Z","avatar_url":"https://github.com/at16k.png","language":"Python","readme":"[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/GlibAI/at16k/graphs/commit-activity)\n[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)\n[![PyPI license](https://img.shields.io/pypi/l/at16k.svg)](https://pypi.python.org/pypi/at16k/)\n[![Open Source Love svg1](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/)\n\u003cimg src=\"https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat\"\u003e\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/at16k.svg)\n[![Downloads](https://pepy.tech/badge/at16k)](https://pepy.tech/project/at16k)\n\n# at16k\nPronounced as ***at sixteen k***.\n\n# What is at16k?\nat16k is a Python library to perform automatic speech recognition or speech to text conversion. The goal of this project is to provide the community with a production quality speech-to-text library.\n\n# Installation\nIt is recommended that you install at16k in a virtual environment.\n\n## Prerequisites\n- Python \u003e= 3.6\n- Tensorflow = 1.14\n- Scipy (for reading wav files)\n\n\n## Install via pip\n```\n$ pip install at16k\n```\n\n## Install from source\nRequires: [poetry](https://github.com/sdispater/poetry)\n```\n$ git clone https://github.com/at16k/at16k.git\n$ poetry env use python3.6\n$ poetry install\n```\n\n# Download models\nCurrently, three models are available for speech to text conversion.\n- en_8k (Trained on English audio recorded at 8 KHz, supports offline ASR)\n- en_16k (Trained on English audio recorded at 16 KHz, supports offline ASR)\n- en_16k_rnnt (Trained on English audio recorded at 16 KHz, supports real-time ASR)\n\nTo download all the models:\n```\n$ python -m at16k.download all\n```\nAlternatively, you can download only the model you need. For example:\n```\n$ python -m at16k.download en_8k\n$ python -m at16k.download en_16k\n$ python -m at16k.download en_16k_rnnt\n```\nBy default, the models will be downloaded and stored at \u003cHOME_DIR\u003e/.at16k. To override the default, set the environment variable AT16K_RESOURCES_DIR.\nFor example:\n```\n$ export AT16K_RESOURCES_DIR=/path/to/my/directory\n```\nYou will need to reuse this environment variable while using the API via command-line, library or REST API.\n\n# Preprocessing audio files\nat16k accepts wav files with the following specs:\n- Channels: 1\n- Bits per sample: 16\n- Sample rate: 8000 (en_8k) or 16000 (en_16k)\n\nUse ffmpeg to convert your audio/video files to an acceptable format. For example,\n```\n# For 8 KHz\n$ ffmpeg -i \u003cinput_file\u003e -ar 8000 -ac 1 -ab 16 \u003coutput_file\u003e\n\n# For 16 KHz\n$ ffmpeg -i \u003cinput_file\u003e -ar 16000 -ac 1 -ab 16 \u003coutput_file\u003e\n```\n\n# Usage\nat16k supports two modes for performing ASR - offline and real-time. And, it comes with a handy command line utility to quickly try out different models and use cases.\n\nHere are a few examples -\n```\n# Offline ASR, 8 KHz sampling rate\n$ at16k-convert -i \u003cpath_to_wav_file\u003e -m en_8k\n\n# Offline ASR, 16 KHz sampling rate\n$ at16k-convert -i \u003cpath_to_wav_file\u003e -m en_16k\n\n# Real-time ASR, 16 KHz sampling rate, from a file, beam decoding\n$ at16k-convert -i \u003cpath_to_wav_file\u003e -m en_16k_rnnt -d beam\n\n# Real-time ASR, 16 KHz sampling rate, from mic input, greedy decoding (requires pyaudio)\n$ at16k-convert -m en_16k_rnnt -d greedy\n```\nIf the ***at16k-convert*** binary is not available for some reason, replace it with - \n```\npython -m at16k.bin.speech_to_text ...\n```\n\n## Library API\nCheck [this file](https://github.com/at16k/at16k/blob/master/at16k/bin/speech_to_text.py) for examples on how to use at16k as a library.\n\n# Limitations\n\nThe max duration of your audio file should be less than **30 seconds** when using **en_8k**, and less than **15 seconds** when using **en_16k**. An error will not be thrown if the duration exceeds the limits, however, your transcript may contain errors and missing text.\n\n# License\n\nThis software is distributed under the MIT license.\n\n# Acknowledgements\n\nWe would like to thank [Google TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc) program for providing access to cloud TPUs.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fat16k%2Fat16k","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fat16k%2Fat16k","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fat16k%2Fat16k/lists"}