An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with automatic-speech-recognition

A curated list of projects in awesome lists tagged with automatic-speech-recognition .

https://github.com/wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

asr automatic-speech-recognition conformer e2e-models production-ready pytorch speech-recognition transformer whisper

Last synced: 13 May 2025

https://github.com/coqui-ai/stt

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 14 May 2025

https://github.com/coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 15 Mar 2025

https://github.com/kakaobrain/pororo

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

automatic-speech-recognition deep-learning natural-language-processing neural-models speech-synthesis

Last synced: 30 Dec 2025

https://github.com/tensorspeech/tensorflowasr

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

automatic-speech-recognition conformer contextnet ctc deepspeech2 end2end jasper rnn-transducer speech-recognition speech-to-text streaming-transducer subword-speech-recognition tensorflow tensorflow2 tflite tflite-convertion tflite-model

Last synced: 14 May 2025

https://github.com/FireRedTeam/FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

asr automatic-speech-recognition conformer industrial-grade llm multimodal-llm open-source speech-recognition speechllm transformer

Last synced: 12 Apr 2025

https://github.com/shirayu/whispering

Streaming transcriber with whisper

automatic-speech-recognition whisper

Last synced: 29 Sep 2025

https://github.com/jitsi/jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

automatic-speech-recognition evaluation-metrics python3 speech-to-text wer word-error-rate

Last synced: 16 Apr 2025

https://github.com/FluidInference/FluidAudio

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

ane asr audio automatic-speech-recognition avfoundation coreml ios macos nvidia parakeet real-time speaker-diarization speaker-embedding speaker-identification speaker-recognition speech-to-text swift vad voice-activity-detection

Last synced: 31 Aug 2025

https://github.com/FireRedTeam/FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.

asr asr-pipeline audio-event-classification audio-event-detection automatic-speech-recognition industrial-grade language-identification lid llm multimodal-llm open-source punctuation-prediction punctuation-restoration sota speech-recognition speechllm vad voice-activity-detection

Last synced: 06 May 2026

https://github.com/noco-ai/spellbook-docker

AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models

automatic-speech-recognition bark llama2 llm-inference mixtral musicgeneration stable-diffusion text-to-speech whisper xttsv2

Last synced: 13 May 2025

https://github.com/CoEDL/elpis

🙊 software for creating speech recognition models.

automatic-speech-recognition computational-linguistics docker kaldi linguistics python transcription

Last synced: 08 May 2025

https://github.com/ieasybooks/tafrigh

تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.

asr automatic-speech-recognition ctranslate2 facebook faster-whisper javascript python soundcloud srt stable-whisper subtitles twitter vtt whisper youtube

Last synced: 06 Feb 2026

https://github.com/altunenes/parakeet-rs

very fast speech-to-text, diarization, streaming (even in CPU) with NVIDIA Parakeet in Rust

asr automatic-speech-recognition onnx parakeet speaker-diarization speaker-identification speech speech-recognition speech-to-text

Last synced: 06 Feb 2026

https://github.com/at16k/at16k

Trained models for automatic speech recognition (ASR). A library to quickly build applications that require speech to text conversion.

asr asr-model automatic-speech-recognition pretrained-models speech-analysis speech-api speech-recognition speech-recognizer speech-to-text voice-commands voice-recognition

Last synced: 13 Jul 2025

https://github.com/lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

automatic-speech-recognition speech-synthesis text-to-speech

Last synced: 20 Aug 2025

https://github.com/mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

asr automatic-speech-recognition ggml ros2 speech-recognition speech-to-text vad voice-activity-detection whisper whisper-cpp

Last synced: 30 Aug 2025

https://github.com/j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

android android-ime automatic-speech-recognition chinese-speech-recognition ime keyboard kotlin openai openai-api speech speech-recognition speech-to-text virtual-keyboard voice voice-recognition whisper

Last synced: 09 Apr 2025

https://github.com/googlecreativelab/obvi

A Polymer 3+ webcomponent / button for doing speech recognition

automatic-speech-recognition button polymer polymer2 speech-recognition webcomponent

Last synced: 29 Jul 2025

https://github.com/sungnyun/armhubert

(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT

automatic-speech-recognition distillation ssl-compression

Last synced: 12 Apr 2025

https://github.com/tsmdt/whisply

💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... fast!

asr automatic-speech-recognition speech-recognition speech-to-text subtitles transcription-tool whisper-ai

Last synced: 09 Apr 2025

https://github.com/soheil-mp/speech-recognition

End-to-End Speech Recognition using Neural Networks.

asr audio automatic-speech-recognition librispeech

Last synced: 20 Aug 2025

https://github.com/sooftware/jasper

PyTorch implementation of "Jasper: An End-to-End Convolutional Neural Acoustic Model" (INTERSPEECH 2019)

asr automatic-speech-recognition cnn jasper nvidia pytorch speech-recognition

Last synced: 09 Apr 2025

https://github.com/j3soon/speech-to-windows-input

Perform speech-to-text (STT/ASR) with Azure speech service and simulate keyboard to input the recognized text; Supports English, Chinese, Japanese, and more.

automatic-speech-recognition azure azure-speech-service chinese-speech-recognition simulate-keyboard speech speech-recognition speech-to-text voice voice-recognition

Last synced: 10 Apr 2025

https://github.com/the-data-dilemma/medibeng-whisper-tiny

MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping, and using AI in healthcare.

audio audio-processing automatic-speech-recognition bengali code-switch english fastapi faster-whisper fine-tuning gradio healthcare openai python speech-recognition speech-to-text synthetic-data transcription transformers translation whisper

Last synced: 17 Mar 2026

https://github.com/popcornell/micrank

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

ad-hoc-microphone-network array-processing asr automatic-speech-recognition channel-selection

Last synced: 12 Apr 2025

https://github.com/saurabhchalke/whisper-meta-quest

Running speech-to-text in a Meta Quest headset using OpenAI's Whisper tiny model

artificial-intelligence automatic-speech-recognition mixed-reality speech-to-text virtual-reality vr whisper

Last synced: 04 Sep 2025

https://github.com/egorsmkv/whisper-ukrainian

Trainer and Evaluation scripts for fine-tuning Whisper models for the Ukrainian language

asr automatic-speech-recognition openai speech-recognition ukrainian whisper

Last synced: 03 Mar 2025

https://github.com/FernandoLpz/SpeechRecognition

This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.

automatic-speech-recognition python speech-recognition speech-to-text transformers wav2vec2 websockets

Last synced: 03 Apr 2025

https://github.com/analyticsinmotion/werpy

🐍📦 Rapidly calculate and analyze the Word Error Rate (WER) with this powerful yet lightweight Python package.

asr asr-evaluation automatic-speech-recognition levenshtein-distance metrics nlp python python-package speech-to-text stt stt-benchmark wer werpy word-error-rate

Last synced: 07 Apr 2025

https://github.com/zevaverbach/tatt

Transcribe All The Things™ is a CLI for creating and managing speech-to-text transcripts.

amazon-transcribe-api asr automatic-speech-recognition cli speech-to-text stt

Last synced: 13 Apr 2025

https://github.com/winstxnhdw/capgen

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

asr automatic-speech-recognition ctranslate2 docker granian huggingface huggingface-spaces litestar whisper

Last synced: 03 Jul 2025

https://github.com/bhattbhavesh91/whisper-youtube

This repository will guide you to create automatically generate YouTube Transcription using Using OpenAI's Whisper

automatic-speech-recognition ffmpeg openai openai-gym python pytube subtitles whisper youtube youtube-dl

Last synced: 17 Apr 2025

https://github.com/abus-aikorea/studio-free

youtube download, vocal remover, vocal extraction, karaoke video production, STT, automatic speech recognition, transcription, automatic subtitle, AI, yt-dlp, demucs, whisper, webui, gradio, windows

ai automatic-speech-recognition automatic-subtitle demucs gradio karaoke openai stt transcription video-download vocal-remover webui whisper windows yt-dlp

Last synced: 25 Apr 2025

https://github.com/bagustris/detect-segment-cough

A python model to detect and segment coughs, forked from coughvid's repo

automatic-speech-recognition cough-detection cough-sound covid-19 speech-segmentation

Last synced: 28 Jun 2025

https://github.com/jarbasal/pocketsphinx-models-mirror

pocketsphinx models for languages originating from the iberian peninsula

asr automatic-speech-recognition pocketsphinx speech-recognition speech-to-text stt stt-models

Last synced: 12 Feb 2026

https://github.com/the-data-dilemma/parquettohuggingface

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

audio-dataset audio-processing automatic-speech-recognition data-analysis data-science dataset healthcare-application huggingface huggingface-datasets pandas parquet parquet-generator python3 speech-data speech-recognition speech-to-text speech-translation

Last synced: 21 Aug 2025

https://github.com/thc1006/breeze-asr-taigi

Taiwanese Hokkien (Taigi) speech-to-text transcriber - MediaTek Breeze-ASR-26 with faster-whisper, tuned for RTX 3050 4GB low-VRAM GPUs. Gradio UI, CLI, Docker, SRT/VTT/TXT/JSON.

asr automatic-speech-recognition breeze-asr chinese-speech-recognition ctranslate2 faster-whisper gradio hokkien low-vram mediatek pytorch rtx-3050 speech-recognition speech-to-text subtitle-generator taigi taiwanese whisper

Last synced: 14 May 2026

https://github.com/roboticslab-uc3m/speech

Text To Speech (TTS) and Automatic Speech Recognition (ASR).

automatic-speech-recognition text-to-speech

Last synced: 19 Jan 2026

https://github.com/bhattbhavesh91/table-question-answering-with-automatic-speech-recognition

Question Answering Gradio Interface on Tabular Data with HuggingFace Transformers Pipeline & TAPAS Wav2Vec2 is a pretrained model for Automatic Speech Recognition (ASR)

automatic-speech-recognition google-assistant gradio-interface huggingface huggingface-transformers huggingface-transformers-pipeline question-answering voice-recognition

Last synced: 27 Feb 2026

https://github.com/BatuhanYilmaz26/Youtube-Transcriber

Input a YouTube video link and get a transcription as a .txt, .vtt or .srt file.

automatic-speech-recognition huggingface openai python speech-recognition streamlit whisper

Last synced: 11 Mar 2025

https://github.com/idiap/tidigitsrecipe.jl

A Julia recipe for training an ASR system using the TIDIGITS database

asr automatic-speech-recognition decoding hidden-markov-models wfst

Last synced: 12 Sep 2025

https://github.com/my-north-ai/semantic_audio_filtering

Synthetic data augmentation technique via LLM for Automatic Speech Recognition fine tuning.

automatic-speech-recognition fine-tuning synthetic-dataset-generation text-to-speech whisper

Last synced: 11 Mar 2025

https://github.com/pleasurecruise/3d-ai-agent

This project aims to create an AI agent capable of expressing a range of emotions through facial expressions and tone of voice, using Large Language Models (LLMs) and Large Vision Models (LVMs).

asr automatic-speech-recognition large-language-models llm ocr optical-character-recognition python text-to-speech tts

Last synced: 15 Apr 2025

https://github.com/astrologos/py-speakeasy

Speakeasy GPT is a Jupyter notebook that utilizes several natural language processing utilities to provide a seamless and low-latency speech interface to ChatGPT and other large language models.

automatic-speech-recognition chat-gpt coqui-ai coqui-tts elevenlabs-api mimic mycroftai text-to-speech whisper

Last synced: 11 Mar 2025

https://github.com/openvoiceos/ovos-stt-plugin-chromium

A stt plugin for mycroft using the google chrome browser api

asr automatic-speech-recognition speech-recognition speech-to-text stt

Last synced: 16 May 2025

https://github.com/egorsmkv/cv10-uk-testset-clean

The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦

asr automatic-speech-recognition speech speech-recognition speech-to-text ukrainian

Last synced: 19 Mar 2026

https://github.com/OpenVoiceOS/ovos-stt-plugin-chromium

A stt plugin for mycroft using the google chrome browser api

asr automatic-speech-recognition speech-recognition speech-to-text stt

Last synced: 10 May 2025

https://github.com/rafat-decodis/robust-asr-for-low-resource-languages

Exploring Benchmark Gaps and Real-World Speech Generalization for Language in Low Resource

artificial-intelligence automatic-speech-recognition data-analysis dataprocessing whisper

Last synced: 23 Jun 2025

https://github.com/marquesafonso/multilang-asr-captioner

A multilingual automatic speech recognition and video captioning tool using faster whisper. Supports real-time translation to english. Runs on consumer grade cpu.

automatic-speech-recognition captioning-videos faster-whisper whisper

Last synced: 11 Mar 2025

https://github.com/analyticsinmotion/werx

🐍📦 Easy-to-use Python package for lightning-fast Word Error Rate analysis

asr automatic-speech-recognition levenshtein-distance metrics speech-to-text stt wer werx word-error-rate word-error-rate-calculator

Last synced: 16 Jun 2025

https://github.com/nico-byte/whisper-web

The Whisper Web Transcription Server is a Python-based real-time speech-to-text transcription system powered by OpenAI's Whisper models. It leverages state-of-the-art models like Distil-Whisper to transcribe audio input in real-time.

ai asr automatic-speech-recognition distil-whisper distil-whisper-large-v3 huggingface huggingface-transformers server vad voice web websockets whisper

Last synced: 26 Apr 2026

https://github.com/swaylenhayes/three-amigos-offline

Triple model automatic speech recognition for Mac: offline, push-to-talk with auto-paste, and MLX-optimized Whisper model choices.

automatic-speech-recognition mlx offline speech-to-text whisper

Last synced: 13 Jan 2026

https://github.com/egorsmkv/asr-corpus-by-microphone

This is a simple solution for people who want to create own corpus for Automatic Speech Recognition with just a microphone

asr automatic-speech-recognition corpus corpus-tools

Last synced: 28 Mar 2025

https://github.com/rishabhmathur06/fine-tuning-whisper-small-for-asr-

This repository contains notebook that shows how to fine-tune OpenAI's Whisper model on custom Hindi dataset.

artificial-intelligence asr automatic-speech-recognition fine-tuning openai python whisper whisper-model

Last synced: 19 Jan 2026