Projects in Awesome Lists tagged with automatic-speech-recognition

https://github.com/wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

asr automatic-speech-recognition conformer e2e-models production-ready pytorch speech-recognition transformer whisper

Last synced: 13 May 2025

https://github.com/zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset

Last synced: 02 Apr 2025

https://github.com/zzw922cn/automatic_speech_recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset

Last synced: 15 May 2025

https://github.com/ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

asr automatic-speech-recognition docker openai-whisper speech speech-recognition speech-to-text

Last synced: 14 May 2025

https://github.com/coqui-ai/stt

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 14 May 2025

https://github.com/coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 15 Mar 2025

https://github.com/kakaobrain/pororo

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

automatic-speech-recognition deep-learning natural-language-processing neural-models speech-synthesis

Last synced: 30 Dec 2025

https://github.com/tensorspeech/tensorflowasr

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

automatic-speech-recognition conformer contextnet ctc deepspeech2 end2end jasper rnn-transducer speech-recognition speech-to-text streaming-transducer subword-speech-recognition tensorflow tensorflow2 tflite tflite-convertion tflite-model

Last synced: 14 May 2025

https://github.com/FireRedTeam/FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

asr automatic-speech-recognition conformer industrial-grade llm multimodal-llm open-source speech-recognition speechllm transformer

Last synced: 12 Apr 2025

https://github.com/snakers4/open_stt

Open STT

asr automatic-speech-recognition dataset russian speech-to-text stt

Last synced: 19 Jul 2025

https://github.com/shirayu/whispering

Streaming transcriber with whisper

automatic-speech-recognition whisper

Last synced: 29 Sep 2025

https://github.com/jitsi/jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

automatic-speech-recognition evaluation-metrics python3 speech-to-text wer word-error-rate

Last synced: 16 Apr 2025

https://github.com/Picovoice/cheetah

On-device streaming speech-to-text engine powered by deep learning

asr automatic-speech-recognition online-speech-recognition speech-recognition speech-to-text streaming-speech-to-text stt transcription voice-recognition

Last synced: 04 May 2025

https://github.com/picovoice/cheetah

On-device streaming speech-to-text engine powered by deep learning

asr automatic-speech-recognition online-speech-recognition speech-recognition speech-to-text streaming-speech-to-text stt transcription voice-recognition

Last synced: 13 Apr 2025

https://github.com/hirofumi0810/neural_sp

End-to-end ASR/LM implementation with PyTorch

asr attention attention-mechanism automatic-speech-recognition ctc language-model language-modeling pytorch rnn-transducer seq2seq sequence-to-sequence speech speech-recognition streaming transformer transformer-xl

Last synced: 02 May 2025

https://github.com/FluidInference/FluidAudio

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

ane asr audio automatic-speech-recognition avfoundation coreml ios macos nvidia parakeet real-time speaker-diarization speaker-embedding speaker-identification speaker-recognition speech-to-text swift vad voice-activity-detection

Last synced: 31 Aug 2025

https://github.com/FireRedTeam/FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.

asr asr-pipeline audio-event-classification audio-event-detection automatic-speech-recognition industrial-grade language-identification lid llm multimodal-llm open-source punctuation-prediction punctuation-restoration sota speech-recognition speechllm vad voice-activity-detection

Last synced: 06 May 2026

https://github.com/picovoice/leopard

On-device speech-to-text engine powered by deep learning

asr automatic-speech-recognition on-device speech-recognition speech-to-text stt transcription voice-recognition voice-to-text

Last synced: 14 May 2025

https://github.com/arthurfdlr/whisper-youtube

🔉 Youtube Videos Transcription with OpenAI's Whisper

automatic-speech-recognition colab-notebook speech-recognition speech-to-text transformer whisper youtube

Last synced: 05 Apr 2025

https://github.com/double22a/speech_dataset

The dataset of Speech Recognition

asr audio automatic-speech-recognition dataset deep-learning deep-neural-networks speech speech-diarization speech-enhancement speech-recognition speech-segmentation speech-separation speech-synthesis speech-to-text speech-translation text-to-speech tts voice-conversion wav

Last synced: 05 May 2025

https://github.com/ArthurFDLR/whisper-youtube

🔉 Youtube Videos Transcription with OpenAI's Whisper

automatic-speech-recognition colab-notebook speech-recognition speech-to-text transformer whisper youtube

Last synced: 01 Apr 2025

https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

asr attention-mechanism automatic-speech-recognition beam-search csj ctc end-to-end end-to-end-learning joint-ctc-attention librispeech speech-recognition speech-to-text tensorflow timit timit-dataset

Last synced: 19 Jul 2025

https://github.com/rolczynski/automatic-speech-recognition

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

automatic-speech-recognition deep-learning deepspeech distill keras language-model machine-learning neural-networks speech-recognition speech-to-text tensorflow tensorflow-models

Last synced: 30 Sep 2025

https://github.com/sovaai/sova-asr

SOVA ASR (Automatic Speech Recognition)

asr asr-model automatic-speech-recognition speech speech-recognition speech-to-text stt wav2letter

Last synced: 19 Jul 2025

https://github.com/vilassn/whisper_android

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

android asr automatic-speech-recognition embedded mobile offline openai speech-recognition tensorflow tensorflowlite text-to-speech texttospeech tflite transcribe transcription tts whisper

Last synced: 22 Oct 2025

https://github.com/noco-ai/spellbook-docker

AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models

automatic-speech-recognition bark llama2 llm-inference mixtral musicgeneration stable-diffusion text-to-speech whisper xttsv2

Last synced: 13 May 2025

https://github.com/CoEDL/elpis

🙊 software for creating speech recognition models.

automatic-speech-recognition computational-linguistics docker kaldi linguistics python transcription

Last synced: 08 May 2025

https://github.com/ieasybooks/tafrigh

تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.

asr automatic-speech-recognition ctranslate2 facebook faster-whisper javascript python soundcloud srt stable-whisper subtitles twitter vtt whisper youtube

Last synced: 06 Feb 2026

https://github.com/altunenes/parakeet-rs

very fast speech-to-text, diarization, streaming (even in CPU) with NVIDIA Parakeet in Rust

asr automatic-speech-recognition onnx parakeet speaker-diarization speaker-identification speech speech-recognition speech-to-text

Last synced: 06 Feb 2026

https://github.com/tugstugi/mongolian-speech-recognition

Mongolian speech recognition with PyTorch

asr automatic-speech-recognition convolutional-neural-networks deep-learning mongolian python pytorch speech-recognition speech-to-text

Last synced: 14 Apr 2025

https://github.com/at16k/at16k

Trained models for automatic speech recognition (ASR). A library to quickly build applications that require speech to text conversion.

asr asr-model automatic-speech-recognition pretrained-models speech-analysis speech-api speech-recognition speech-recognizer speech-to-text voice-commands voice-recognition

Last synced: 13 Jul 2025

https://github.com/kmario23/kenlm-training

Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2

automatic-speech-recognition deep-neural-networks deep-speech kenlm kenlm-toolkit language-model language-modeling natural-language-processing probabilistic-models python speech-recognition

Last synced: 07 Apr 2025

https://github.com/kmario23/KenLM-training

Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2

automatic-speech-recognition deep-neural-networks deep-speech kenlm kenlm-toolkit language-model language-modeling natural-language-processing probabilistic-models python speech-recognition

Last synced: 19 Jul 2025

https://github.com/andi611/zerospeech-tts-without-t

A Pytorch implementation for the ZeroSpeech 2019 challenge.

adversarial-learning asr autoencoder automatic-speech-recognition gan text-to-speech tts tts-without-t zerospeech

Last synced: 01 Mar 2026

https://github.com/lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

automatic-speech-recognition speech-synthesis text-to-speech

Last synced: 20 Aug 2025

https://github.com/primaprashant/awesome-voice-typing

Curated list of open-source speech-to-text and voice typing tools for Linux, macOS, Windows, Android, and iOS. Offline, local, and cloud.

ai automatic-speech-recognition awesome-list dictation dictation-tool faster-whisper linux local-transcription macos offline-speech-recognition open-source parakeet privacy-focused push-to-talk speech-to-text transcription voice-typing whisper whisper-cpp wisprflow-alternative

Last synced: 15 Apr 2026

https://github.com/mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

asr automatic-speech-recognition ggml ros2 speech-recognition speech-to-text vad voice-activity-detection whisper whisper-cpp

Last synced: 30 Aug 2025

https://github.com/j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

android android-ime automatic-speech-recognition chinese-speech-recognition ime keyboard kotlin openai openai-api speech speech-recognition speech-to-text virtual-keyboard voice voice-recognition whisper

Last synced: 09 Apr 2025

https://github.com/undertheseanlp/automatic_speech_recognition

Vietnamese Automatic Speech Recognition

automatic-speech-recognition nlp vietnamese vietnamese-nlp

Last synced: 17 Feb 2026

https://github.com/pythainlp/pythaiasr

Python Thai Automatic Speech Recognition

asr automatic-speech-recognition hacktoberfest hacktoberfest2022 thai-language thai-nlp

Last synced: 13 Apr 2025

https://github.com/googlecreativelab/obvi

A Polymer 3+ webcomponent / button for doing speech recognition

automatic-speech-recognition button polymer polymer2 speech-recognition webcomponent

Last synced: 29 Jul 2025

https://github.com/sungnyun/armhubert

(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT

automatic-speech-recognition distillation ssl-compression

Last synced: 12 Apr 2025

https://github.com/tsmdt/whisply

💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... fast!

asr automatic-speech-recognition speech-recognition speech-to-text subtitles transcription-tool whisper-ai

Last synced: 09 Apr 2025

https://github.com/ttop32/wav2vec2-live-japanese-translator

real time japanese speech recognition translator using wav2vec2

asr audio automatic-speech-recognition fine-tuning huggingface japanese live pyaudio pyqt5 pytorch real-time speaker-recognition speech-to-text spoken-language-understanding stt translation translator voice voice-recognition wav2vec2

Last synced: 03 Sep 2025

https://github.com/soheil-mp/speech-recognition

End-to-End Speech Recognition using Neural Networks.

asr audio automatic-speech-recognition librispeech

Last synced: 20 Aug 2025

https://github.com/sooftware/jasper

PyTorch implementation of "Jasper: An End-to-End Convolutional Neural Acoustic Model" (INTERSPEECH 2019)

asr automatic-speech-recognition cnn jasper nvidia pytorch speech-recognition

Last synced: 09 Apr 2025

https://github.com/lucasgris/wav2vec4bp

Wav2vec resources and models for Brazilian Portuguese

automatic-speech-recognition brazilian-portuguese dataset portuguese speech-to-text wav2vec wav2vec2

Last synced: 07 May 2025

https://github.com/j3soon/speech-to-windows-input

Perform speech-to-text (STT/ASR) with Azure speech service and simulate keyboard to input the recognized text; Supports English, Chinese, Japanese, and more.

automatic-speech-recognition azure azure-speech-service chinese-speech-recognition simulate-keyboard speech speech-recognition speech-to-text voice voice-recognition

Last synced: 10 Apr 2025

https://github.com/the-data-dilemma/medibeng-whisper-tiny

MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping, and using AI in healthcare.

audio audio-processing automatic-speech-recognition bengali code-switch english fastapi faster-whisper fine-tuning gradio healthcare openai python speech-recognition speech-to-text synthetic-data transcription transformers translation whisper

Last synced: 17 Mar 2026

https://github.com/kssteven418/q-asr

[ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition

automatic-speech-recognition deep-learning efficient-model efficient-neural-networks jasper model-compression quantization quartznet speech speech-recognition

Last synced: 31 Jul 2025

https://github.com/popcornell/micrank

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

ad-hoc-microphone-network array-processing asr automatic-speech-recognition channel-selection

Last synced: 12 Apr 2025

https://github.com/saurabhchalke/whisper-meta-quest

Running speech-to-text in a Meta Quest headset using OpenAI's Whisper tiny model

artificial-intelligence automatic-speech-recognition mixed-reality speech-to-text virtual-reality vr whisper

Last synced: 04 Sep 2025

https://github.com/0xPD33/sonori

Sonori is a fully local STT app for Linux (Wayland).

asr automatic-speech-recognition ctranslate2 linux onnxruntime speech-recognition speech-to-text stt voice-activity-detection voice-recognition vulkan wayland wgpu whisper whisper-cpp

Last synced: 18 Jun 2026

https://github.com/egorsmkv/whisper-ukrainian

Trainer and Evaluation scripts for fine-tuning Whisper models for the Ukrainian language

asr automatic-speech-recognition openai speech-recognition ukrainian whisper

Last synced: 03 Mar 2025

https://github.com/linto-ai/linto-agent

LinTO platform services stack deployment tool for Docker Swarm cluster

asr automatic-speech-processing automatic-speech-recognition cluster docker-compose docker-swarm microservices smart-assistant virtual-agent vocal-assistant

Last synced: 26 Oct 2025

https://github.com/ivankunyankin/quartznet-asr

asr automatic-speech-recognition jasper pytorch quartznet

Last synced: 08 Apr 2026

https://github.com/openvoiceos/ovos-stt-plugin-vosk

vosk STT plugin for mycroft

asr automatic-speech-recognition hacktoberfest kaldi speech-recognition speech-to-text stt vosk

Last synced: 16 May 2025

https://github.com/FernandoLpz/SpeechRecognition

This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.

automatic-speech-recognition python speech-recognition speech-to-text transformers wav2vec2 websockets

Last synced: 03 Apr 2025

https://github.com/OpenVoiceOS/ovos-stt-plugin-vosk

vosk STT plugin for mycroft

asr automatic-speech-recognition hacktoberfest kaldi speech-recognition speech-to-text stt vosk

Last synced: 10 May 2025

https://github.com/megengine/end-to-end-asr-transformer

An end to end ASR Transformer model training repo

asr-model attention-mechanism automatic-speech-recognition megengine transfomer

Last synced: 12 Apr 2025

https://github.com/analyticsinmotion/werpy

🐍📦 Rapidly calculate and analyze the Word Error Rate (WER) with this powerful yet lightweight Python package.

asr asr-evaluation automatic-speech-recognition levenshtein-distance metrics nlp python python-package speech-to-text stt stt-benchmark wer werpy word-error-rate

Last synced: 07 Apr 2025

https://github.com/jmaczan/asr-dysarthria

Research on Automatic Speech Recognition for dysarthric speech

asr automatic-speech-recognition deep-learning dysarthria dysarthric-speech self-supervised-learning wav2vec2

Last synced: 12 Apr 2025

https://github.com/zevaverbach/tatt

Transcribe All The Things™ is a CLI for creating and managing speech-to-text transcripts.

amazon-transcribe-api asr automatic-speech-recognition cli speech-to-text stt

Last synced: 13 Apr 2025

https://github.com/estuary-ai/mangrove

Mangrove is the backend module of Estuary, a framework for building multimodal real-time Socially Intelligent Agents (SIAs).

affective-computing agents artificial-intelligence automatic-speech-recognition digital-assistant framework human-computer-interaction large-language-models socially-aware-agents socially-intelligent-agents speech-recognition speech-synthesis

Last synced: 11 Feb 2026

https://github.com/abus-aikorea/studio-free

youtube download, vocal remover, vocal extraction, karaoke video production, STT, automatic speech recognition, transcription, automatic subtitle, AI, yt-dlp, demucs, whisper, webui, gradio, windows

ai automatic-speech-recognition automatic-subtitle demucs gradio karaoke openai stt transcription video-download vocal-remover webui whisper windows yt-dlp

Last synced: 25 Apr 2025

https://github.com/scalable-ml-deep-learning/fine_tune_whisper

Fine-Tune Whisper for Italian ASR with transformers

automatic-speech-recognition common-voice-dataset huggingface openai transformers whisper

Last synced: 11 Mar 2025

https://github.com/bhattbhavesh91/whisper-youtube

This repository will guide you to create automatically generate YouTube Transcription using Using OpenAI's Whisper

automatic-speech-recognition ffmpeg openai openai-gym python pytube subtitles whisper youtube youtube-dl

Last synced: 17 Apr 2025

https://github.com/winstxnhdw/capgen

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

asr automatic-speech-recognition ctranslate2 docker granian huggingface huggingface-spaces litestar whisper

Last synced: 03 Jul 2025

https://github.com/bagustris/detect-segment-cough

A python model to detect and segment coughs, forked from coughvid's repo

automatic-speech-recognition cough-detection cough-sound covid-19 speech-segmentation

Last synced: 28 Jun 2025

https://github.com/sinaahmadi/CORDI

Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)

automatic-speech-recognition dialect-identification erbil kurdish kurdish-language-processing language-identification machine-translation mahabad sanandaj sorani sulaymaniyah

Last synced: 07 May 2025

https://github.com/jarbasal/pocketsphinx-models-mirror

pocketsphinx models for languages originating from the iberian peninsula

asr automatic-speech-recognition pocketsphinx speech-recognition speech-to-text stt stt-models

Last synced: 12 Feb 2026

https://github.com/thc1006/breeze-asr-taigi

Taiwanese Hokkien (Taigi) speech-to-text transcriber - MediaTek Breeze-ASR-26 with faster-whisper, tuned for RTX 3050 4GB low-VRAM GPUs. Gradio UI, CLI, Docker, SRT/VTT/TXT/JSON.

asr automatic-speech-recognition breeze-asr chinese-speech-recognition ctranslate2 faster-whisper gradio hokkien low-vram mediatek pytorch rtx-3050 speech-recognition speech-to-text subtitle-generator taigi taiwanese whisper

Last synced: 14 May 2026

https://github.com/the-data-dilemma/parquettohuggingface

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

audio-dataset audio-processing automatic-speech-recognition data-analysis data-science dataset healthcare-application huggingface huggingface-datasets pandas parquet parquet-generator python3 speech-data speech-recognition speech-to-text speech-translation

Last synced: 21 Aug 2025

https://github.com/bhattbhavesh91/table-question-answering-with-automatic-speech-recognition

Question Answering Gradio Interface on Tabular Data with HuggingFace Transformers Pipeline & TAPAS Wav2Vec2 is a pretrained model for Automatic Speech Recognition (ASR)

automatic-speech-recognition google-assistant gradio-interface huggingface huggingface-transformers huggingface-transformers-pipeline question-answering voice-recognition

Last synced: 27 Feb 2026

https://github.com/roboticslab-uc3m/speech

Text To Speech (TTS) and Automatic Speech Recognition (ASR).

automatic-speech-recognition text-to-speech

Last synced: 19 Jan 2026

https://github.com/BatuhanYilmaz26/Youtube-Transcriber

Input a YouTube video link and get a transcription as a .txt, .vtt or .srt file.

automatic-speech-recognition huggingface openai python speech-recognition streamlit whisper

Last synced: 11 Mar 2025

https://github.com/idiap/tidigitsrecipe.jl

A Julia recipe for training an ASR system using the TIDIGITS database

asr automatic-speech-recognition decoding hidden-markov-models wfst

Last synced: 12 Sep 2025

https://github.com/my-north-ai/semantic_audio_filtering

Synthetic data augmentation technique via LLM for Automatic Speech Recognition fine tuning.

automatic-speech-recognition fine-tuning synthetic-dataset-generation text-to-speech whisper

Last synced: 11 Mar 2025

https://github.com/khaykingleb/automatic-speech-recognition

QuartzNet and Deepspeech Implementation for ASR

automatic-speech-recognition deep-learning deepspeech pytorch quartz-net speech-recognition

Last synced: 27 Apr 2026

https://github.com/pleasurecruise/3d-ai-agent

This project aims to create an AI agent capable of expressing a range of emotions through facial expressions and tone of voice, using Large Language Models (LLMs) and Large Vision Models (LVMs).

asr automatic-speech-recognition large-language-models llm ocr optical-character-recognition python text-to-speech tts

Last synced: 15 Apr 2025

https://github.com/openvoiceos/ovos-stt-plugin-chromium

A stt plugin for mycroft using the google chrome browser api

asr automatic-speech-recognition speech-recognition speech-to-text stt

Last synced: 16 May 2025

https://github.com/pprattis/automatic-speech-recognision-system-ASR

A python script that implements an automatic speech recognision system.

asr automatic-speech-recognition computer-science dtw dynamic-time-warping fir-filter librosa mel-frequency-cepstral-coefficients mfcc nyquist program python short-time-fourier-transform short-time-signal-analysis signal signal-processing student

Last synced: 28 Sep 2025

https://github.com/astrologos/py-speakeasy

Speakeasy GPT is a Jupyter notebook that utilizes several natural language processing utilities to provide a seamless and low-latency speech interface to ChatGPT and other large language models.

automatic-speech-recognition chat-gpt coqui-ai coqui-tts elevenlabs-api mimic mycroftai text-to-speech whisper

Last synced: 11 Mar 2025

https://github.com/pprattis/automatic-speech-recognision-system-asr

A python script that implements an automatic speech recognision system.

asr automatic-speech-recognition computer-science dtw dynamic-time-warping fir-filter librosa mel-frequency-cepstral-coefficients mfcc nyquist program python short-time-fourier-transform short-time-signal-analysis signal signal-processing student

Last synced: 07 Sep 2025

https://github.com/nico-byte/whisper-web

The Whisper Web Transcription Server is a Python-based real-time speech-to-text transcription system powered by OpenAI's Whisper models. It leverages state-of-the-art models like Distil-Whisper to transcribe audio input in real-time.

ai asr automatic-speech-recognition distil-whisper distil-whisper-large-v3 huggingface huggingface-transformers server vad voice web websockets whisper

Last synced: 26 Apr 2026

https://github.com/egorsmkv/cv10-uk-testset-clean

The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦

asr automatic-speech-recognition speech speech-recognition speech-to-text ukrainian

Last synced: 19 Mar 2026

https://github.com/analyticsinmotion/werx

🐍📦 Easy-to-use Python package for lightning-fast Word Error Rate analysis

asr automatic-speech-recognition levenshtein-distance metrics speech-to-text stt wer werx word-error-rate word-error-rate-calculator

Last synced: 16 Jun 2025

https://github.com/OpenVoiceOS/ovos-stt-plugin-chromium

A stt plugin for mycroft using the google chrome browser api

asr automatic-speech-recognition speech-recognition speech-to-text stt

Last synced: 10 May 2025

https://github.com/rafat-decodis/robust-asr-for-low-resource-languages

Exploring Benchmark Gaps and Real-World Speech Generalization for Language in Low Resource

artificial-intelligence automatic-speech-recognition data-analysis dataprocessing whisper

Last synced: 23 Jun 2025

https://github.com/jeronymous/deep_learning_notebooks

Self-containing notebooks to play simply with some particular concepts in Deep Learning

artificial-intelligence artificial-neural-networks automatic-speech-recognition deep-learning deep-neural-networks machine-learning natural-language-processing speech-recognition speech-to-text tokenization tokenizer-nlp tokenizers

Last synced: 16 Feb 2026

https://github.com/marquesafonso/multilang-asr-captioner

A multilingual automatic speech recognition and video captioning tool using faster whisper. Supports real-time translation to english. Runs on consumer grade cpu.

automatic-speech-recognition captioning-videos faster-whisper whisper

Last synced: 11 Mar 2025