{"id":50200852,"url":"https://github.com/shhossain/BanglaSpeech2Text","last_synced_at":"2026-06-11T14:00:33.815Z","repository":{"id":65215378,"uuid":"588019194","full_name":"shhossain/BanglaSpeech2Text","owner":"shhossain","description":"BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.","archived":false,"fork":false,"pushed_at":"2025-03-01T18:43:01.000Z","size":1940,"stargazers_count":121,"open_issues_count":3,"forks_count":18,"subscribers_count":9,"default_branch":"main","last_synced_at":"2026-03-21T06:28:58.458Z","etag":null,"topics":["bangla","bangla-asr","bangla-automatic-speech-recognition","bangla-speech-recognition","bangla-speech-to-text","bangla-voice-recognition","deep-learning","hacktoberfest","machine-learning","pytorch","speech","speech-recognition","speech-to-text","transformer","voice-recognition","whisper","whisper-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shhossain.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"ko_fi":"shhossain"}},"created_at":"2023-01-12T06:02:21.000Z","updated_at":"2026-03-09T10:17:17.000Z","dependencies_parsed_at":"2023-09-25T23:22:58.461Z","dependency_job_id":"fa57553c-0c91-4c71-9d28-c40baac47c83","html_url":"https://github.com/shhossain/BanglaSpeech2Text","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shhossain/BanglaSpeech2Text","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shhossain%2FBanglaSpeech2Text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shhossain%2FBanglaSpeech2Text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shhossain%2FBanglaSpeech2Text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shhossain%2FBanglaSpeech2Text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shhossain","download_url":"https://codeload.github.com/shhossain/BanglaSpeech2Text/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shhossain%2FBanglaSpeech2Text/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34201842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bangla","bangla-asr","bangla-automatic-speech-recognition","bangla-speech-recognition","bangla-speech-to-text","bangla-voice-recognition","deep-learning","hacktoberfest","machine-learning","pytorch","speech","speech-recognition","speech-to-text","transformer","voice-recognition","whisper","whisper-model"],"created_at":"2026-05-25T22:00:42.189Z","updated_at":"2026-06-11T14:00:33.805Z","avatar_url":"https://github.com/shhossain.png","language":"Python","funding_links":["https://ko-fi.com/shhossain"],"categories":["Developer Tools \u0026 Libraries"],"sub_categories":["🚀 How to contribute"],"readme":"# BanglaSpeech2Text (Bangla Speech to Text)\n\nBanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.\n\n## [Models](https://github.com/shhossain/BanglaSpeech2Text/blob/main/banglaspeech2text/utils/listed_models.json)\n\n| Model   | Size       | Best(WER) |\n| ------- | ---------- | --------- |\n| `tiny`  | 100-200 MB | 74        |\n| `base`  | 200-300 MB | 46        |\n| `small` | 1 GB       | 18        |\n| `large` | 3-4 GB     | 11        |\n\n**NOTE**: Bigger model have better accuracy but slower inference speed. More models [HuggingFace Model Hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition\u0026language=bn\u0026sort=likes)\n\n## Pre-requisites\n\n- Python 3.7 or higher\n\n## Test it in Google Colab\n\n- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shhossain/BanglaSpeech2Text/blob/main/banglaspeech2text_in_colab.ipynb)\n\n## Installation\n\nYou can install the library using pip:\n\n```bash\npip install banglaspeech2text\n```\n\n## Usage\n\n### Model Initialization\n\nTo use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the \"base\" model, but you can choose from different pre-trained models: \"tiny\", \"small\", \"base\", or \"large\". Here's an example:\n\n```python\nfrom banglaspeech2text import Speech2Text\n\nstt = Speech2Text(\"base\")\n\n# You can use it wihout specifying model name (default model is \"large\")\nstt = Speech2Text()\n```\n\n### Transcribing Audio Files\n\nYou can transcribe an audio file by calling the `recognize` method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:\n\n```python\ntranscription = stt.recognize(\"audio.wav\")\nprint(transcription)\n```\n\n### Get Transcription as they are processed with time\n\n```python\nsegments = stt.recognize(\"audio.wav\", return_segments=True)\nfor segment in segments:\n    print(\"[%.2fs -\u003e %.2fs] %s\" % (segment.start, segment.end, segment.text))\n```\n\n## Multiple Audio Formats\n\nBanglaSpeech2Text supports the following audio formats for input:\n\n- File Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, and more.\n- Bytes: Raw audio data in byte format.\n- Numpy Array: Numpy array representing audio data, preferably obtained using librosa.load.\n- AudioData: Audio data obtained from the speech_recognition library.\n- AudioSegment: Audio segment objects from the pydub library.\n- BytesIO: Audio data provided through BytesIO objects from the io module.\n- Path: Pathlib Path object pointing to an audio file.\n\nNo need for extra code to convert audio files to a specific format. BanglaSpeech2Text automatically handles the conversion for you:\n\n```python\ntranscription = stt.recognize(\"audio.mp3\")\nprint(transcription)\n```\n\n### Use with SpeechRecognition\n\nYou can use [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) package to get audio from microphone and transcribe it. Here's an example:\n\n```python\nimport speech_recognition as sr\nfrom banglaspeech2text import Speech2Text\n\nstt = Speech2Text()\n\nr = sr.Recognizer()\nwith sr.Microphone() as source:\n    print(\"Say something!\")\n    r.adjust_for_ambient_noise(source)\n    audio = r.listen(source)\n    output = stt.recognize(audio)\n\nprint(output)\n```\n\n### Instantly Check with gradio\n\nYou can instantly check the model with gradio. Here's an example:\n\n```python\nfrom banglaspeech2text import Speech2Text, available_models\nimport gradio as gr\n\nstt = Speech2Text()\n\n# You can also open the url and check it in mobile\ngr.Interface(\n    fn=stt.recognize,\n    inputs=gr.Audio(source=\"microphone\", type=\"filepath\"),\n    outputs=\"text\").launch(share=True)\n```\n\n## Some more usage examples\n\n### Use huggingface model\n\n```python\nstt = Speech2Text(\"openai/whisper-tiny\")\n```\n\n### See current model info\n\n```python\nstt = Speech2Text(\"base\")\n\nprint(stt.model_metadata) # Model metadata (name, size, wer, license, etc.)\nprint(stt.model_metadata.wer) # Word Error Rate (not available for all models)\n```\n\n### CLI\n\nYou can use the library from the command line. Here's an example:\n\n```bash\nbnstt 'file.wav'\n```\n\nYou can also use it with microphone:\n\n```bash\nbnstt --mic\n```\n\nOther options:\n\n```bash\nusage: bnstt\n       [-h]\n       [-gpu]\n       [-c CACHE]\n       [-o OUTPUT]\n       [-m MODEL]\n       [-s]\n       [-sm MIN_SILENCE_LENGTH]\n       [-st SILENCE_THRESH]\n       [-sp PADDING]\n       [--list]\n       [--info]\n       [INPUT ...]\n\nBangla Speech to Text\n\npositional arguments:\n  INPUT\n    inputfile(s) or list of files\n\noptions:\n  -h, --help\n    show this help message and exit\n  -o OUTPUT, --output OUTPUT\n    output directory\n  -m MODEL, --model MODEL\n    model name\n  --list list of available models\n  --info show model info\n```\n\n## Custom Use Cases and Support\n\nIf your business or project has specific speech-to-text requirements that go beyond the capabilities of the provided open-source package, I'm here to help! I understand that each use case is unique, and I'm open to collaborating on custom solutions that meet your needs. Whether you have longer audio files that need accurate transcription, require model fine-tuning, or need assistance in implementing the package effectively, I'm available for support.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshhossain%2FBanglaSpeech2Text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshhossain%2FBanglaSpeech2Text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshhossain%2FBanglaSpeech2Text/lists"}