Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with speech-recognition

A curated list of projects in awesome lists tagged with speech-recognition .

https://github.com/ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

inference openai speech-recognition speech-to-text transformer whisper

Last synced: 16 Dec 2024

https://github.com/mozilla/deepspeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Last synced: 16 Dec 2024

https://github.com/mozilla/DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Last synced: 25 Oct 2024

https://github.com/mozilla/stt

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Last synced: 13 Oct 2024

https://github.com/kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

c-plus-plus cuda kaldi shell speaker-id speaker-verification speech speech-recognition speech-to-text

Last synced: 16 Dec 2024

https://github.com/nvidia/deeplearningexamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

computer-vision deep-learning drug-discovery forecasting large-language-models mxnet nlp paddlepaddle pytorch recommender-systems speech-recognition speech-synthesis tensorflow tensorflow2 translation

Last synced: 16 Dec 2024

https://github.com/NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

computer-vision deep-learning drug-discovery forecasting large-language-models mxnet nlp paddlepaddle pytorch recommender-systems speech-recognition speech-synthesis tensorflow tensorflow2 translation

Last synced: 27 Oct 2024

https://github.com/m-bain/whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text whisper

Last synced: 16 Dec 2024

https://github.com/paddlepaddle/paddlespeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper

Last synced: 16 Dec 2024

https://github.com/PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper

Last synced: 29 Oct 2024

https://github.com/m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text whisper

Last synced: 25 Oct 2024

https://github.com/uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

audio python speech-recognition speech-to-text

Last synced: 16 Dec 2024

https://github.com/Uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

audio python speech-recognition speech-to-text

Last synced: 28 Oct 2024

https://github.com/nl8590687/asrt_speechrecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

asrt chinese-speech-recognition cnn ctc keras python python3 speech-recognition speech-to-text tensorflow

Last synced: 16 Dec 2024

https://github.com/nl8590687/ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

asrt chinese-speech-recognition cnn ctc keras python python3 speech-recognition speech-to-text tensorflow

Last synced: 31 Oct 2024

https://github.com/modelscope/funasr

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper

Last synced: 17 Dec 2024

https://github.com/talater/annyang

💬 Speech recognition for your site

speech speech-recognition speech-to-text voice

Last synced: 16 Dec 2024

https://github.com/TalAter/annyang

:speech_balloon: Speech recognition for your site

hacktoberfest speech speech-recognition speech-to-text voice

Last synced: 25 Oct 2024

https://github.com/flashlight/wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

cpp deep-learning end-to-end speech-recognition wav2letter

Last synced: 17 Dec 2024

https://github.com/modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper

Last synced: 29 Oct 2024

https://github.com/snakers4/silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

asr capitalization colab english german onnx pretrained-models pytorch repunctuation spanish speech speech-recognition speech-synthesis speech-to-text stt stt-benchmark text-to-speech torch-hub tts tts-models

Last synced: 18 Dec 2024

https://github.com/sanchit-gandhi/whisper-jax

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

deep-learning jax speech-recognition speech-to-text whisper

Last synced: 19 Dec 2024

https://github.com/wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

asr automatic-speech-recognition conformer e2e-models production-ready pytorch speech-recognition transformer whisper

Last synced: 16 Dec 2024

https://github.com/argmaxinc/whisperkit

On-device Speech Recognition for Apple Silicon

inference ios macos speech-recognition swift transformers visionos watchos whisper

Last synced: 17 Dec 2024

https://github.com/cmusphinx/pocketsphinx

A small speech recognizer

c python speech-recognition

Last synced: 16 Dec 2024

https://github.com/argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

inference ios macos speech-recognition swift transformers visionos watchos whisper

Last synced: 31 Oct 2024

https://github.com/modelscope/funclip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

gradio gradio-python-llm llm speech-recognition speech-to-text subtitles-generator video-clip video-subtitles

Last synced: 18 Dec 2024

https://github.com/huggingface/distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

audio speech-recognition whisper

Last synced: 17 Dec 2024

https://github.com/mahmoudashraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text whisper

Last synced: 17 Dec 2024

https://github.com/MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text whisper

Last synced: 31 Oct 2024

https://github.com/yanshengjia/ml-road

Machine Learning Resources, Practice and Research

computer-vision deep-learning machine-learning nlp pytorch speech-recognition tensorflow

Last synced: 27 Nov 2024

https://github.com/toverainc/willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

alexa deep-learning echo esp-adf esp-idf esp32 google-home home-assistant home-automation privacy speech-recognition speech-to-text whisper

Last synced: 18 Dec 2024

https://github.com/jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式

speech speech-recognition speech-to-text stt

Last synced: 19 Dec 2024

https://github.com/rhasspy/rhasspy

Offline private voice assistant for many human languages

home-assistant node-red privacy speech-recognition voice-assistants voice-commands

Last synced: 20 Dec 2024

https://github.com/mravanelli/pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

asr deep-learning deep-neural-networks dnn dnn-hmm gru kaldi lstm lstm-neural-networks multilayer-perceptron-network pytorch recurrent-neural-networks rnn rnn-model speech speech-recognition timit

Last synced: 21 Dec 2024

https://github.com/abus-aikorea/voice-pro

Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.

faster-whisper gradio podcasts speech-recognition speech-synthesis speech-to-text stt subtitles text-to-speech transcription translation translator tts voice-cloning voice-conversion webui whisper yt-dlp

Last synced: 21 Dec 2024

https://github.com/coqui-ai/stt

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 18 Dec 2024

https://github.com/coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 26 Oct 2024

https://github.com/pannous/tensorflow-speech-recognition

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

deep-learning neural-network speech-recognition speech-to-text stt tensorflow

Last synced: 20 Dec 2024

https://github.com/alan-ai/alan-sdk-ios

Conversational AI SDK for iOS to enable text and voice conversations with actions (Swift, Objective-C)

alan-ios-sdk alan-studio alan-voice chatbot conversational-ai ios machine-learning sdk speech-recognition voice voice-ai voice-assistant voice-commands

Last synced: 19 Dec 2024

https://github.com/chenyme/chenyme-aavt

这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。

faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper

Last synced: 18 Dec 2024

https://github.com/nobody132/masr

中文语音识别; Mandarin Automatic Speech Recognition;

chinese-speech-recognition mandarin-chinese pytorch speech-recognition

Last synced: 19 Dec 2024

https://github.com/syhw/wer_are_we

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

deep-neural-network speech-recognition wer

Last synced: 02 Dec 2024

https://github.com/julius-speech/julius

Open-Source Large Vocabulary Continuous Speech Recognition Engine

audio-processing recognition speech speech-recognition

Last synced: 18 Dec 2024

https://github.com/astorfi/lip-reading-deeplearning

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

3d-convolutional-network computer-vision deep-learning speech-recognition tensorflow

Last synced: 21 Dec 2024

https://github.com/react-native-voice/voice

:microphone: React Native Voice Recognition library for iOS and Android (Online and Offline Support)

android ios react-native speech-recognition voice-recognition

Last synced: 17 Dec 2024

https://github.com/fl33tw00d/whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️

audio machine-learning rust speech-recognition webgpu whisper windows

Last synced: 20 Dec 2024

https://github.com/FL33TW00D/whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️

audio machine-learning rust speech-recognition webgpu whisper windows

Last synced: 05 Nov 2024

https://github.com/kalliope-project/kalliope

Kalliope is a framework that will help you to create your own personal assistant.

bot bot-creation home-automation jarvis linux personal-assistant raspberry speech-recognition speech-synthesis speech-to-text

Last synced: 19 Dec 2024

https://github.com/bjoernkarmann/project_alias

Alias is a teachable “parasite” that is designed to give users more control over their smart assistants, both when it comes to customisation and privacy. Through a simple app the user can train Alias to react on a custom wake-word/sound, and once trained, Alias can take control over your home assistant by activating it for you.

alias classification hack machine-learning microphone raspberry-pi smarthome sound-synthesis speech-recognition wakeword

Last synced: 11 Oct 2024

https://github.com/Chenyme/Chenyme-AAVT

这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。

faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper

Last synced: 07 Nov 2024

https://github.com/pluja/whishper

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!

ai audio-to-text golang speech-recognition speech-to-text stt subtitles sveltekit transcription ui web web-whisper webapp whisper

Last synced: 19 Dec 2024

https://github.com/sdkcarlos/artyom.js

A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.

recognition speech-recognition speech-synthesis speech-to-text voice-commands

Last synced: 20 Dec 2024

https://github.com/alan-ai/alan-sdk-cordova

Conversational AI SDK for Apache Cordova to enable text and voice conversations with actions (iOS and Android)

chatbot conversational-ai machine-learning multimodal speech-recognition text-to-speech voice-assistant voice-commands voice-interface vui

Last synced: 15 Dec 2024

https://github.com/k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

asr c cpp csharp go kotlin python speech-recognition vad voice-activity-detection

Last synced: 18 Dec 2024

https://github.com/alumae/kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

speech-recognition

Last synced: 20 Dec 2024

https://github.com/modal-labs/quillman

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

ai language-model python serverless speech-recognition speech-to-text

Last synced: 27 Oct 2024

https://github.com/sooftware/conformer

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

asr augmented cnn conformer conv convolution pytorch recognition speech speech-recognition transformer transformer-xl

Last synced: 16 Dec 2024

https://github.com/lhotse-speech/lhotse

Tools for handling speech data in machine learning projects.

ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition

Last synced: 28 Nov 2024

https://github.com/athena-team/athena

an open-source implementation of sequence-to-sequence based speech processing engine

asr ctc deployment sequence-to-sequence speaker-recognition speech-recognition speech-synthesis tensorflow transformer tts unsupervised-learning wfst

Last synced: 28 Nov 2024