Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with speech-recognition
A curated list of projects in awesome lists tagged with speech-recognition .
https://github.com/huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
bert deep-learning flax hacktoberfest jax language-model language-models machine-learning model-hub natural-language-processing nlp nlp-library pretrained-models python pytorch pytorch-transformers seq2seq speech-recognition tensorflow transformer
Last synced: 16 Dec 2024
https://github.com/ggerganov/whisper.cpp
Port of OpenAI's Whisper model in C/C++
inference openai speech-recognition speech-to-text transformer whisper
Last synced: 16 Dec 2024
https://github.com/mozilla/deepspeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow
Last synced: 16 Dec 2024
https://github.com/mozilla/DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow
Last synced: 25 Oct 2024
https://github.com/mozilla/stt
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow
Last synced: 13 Oct 2024
https://github.com/leon-ai/leon
🧠 Leon is your open-source personal assistant.
ai ai-assistant artificial-intelligence assistant automation bot chatbot flite leon nodejs offline personal-assistant privacy python speech-recognition speech-synthesis speech-to-text text-to-speech virtual-assistant voice-assistant
Last synced: 16 Dec 2024
https://github.com/kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
c-plus-plus cuda kaldi shell speaker-id speaker-verification speech speech-recognition speech-to-text
Last synced: 16 Dec 2024
https://github.com/nvidia/deeplearningexamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
computer-vision deep-learning drug-discovery forecasting large-language-models mxnet nlp paddlepaddle pytorch recommender-systems speech-recognition speech-synthesis tensorflow tensorflow2 translation
Last synced: 16 Dec 2024
https://github.com/NVIDIA/DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
computer-vision deep-learning drug-discovery forecasting large-language-models mxnet nlp paddlepaddle pytorch recommender-systems speech-recognition speech-synthesis tensorflow tensorflow2 translation
Last synced: 27 Oct 2024
https://github.com/systran/faster-whisper
Faster Whisper transcription with CTranslate2
deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper
Last synced: 16 Dec 2024
https://github.com/guillaumekln/faster-whisper
Faster Whisper transcription with CTranslate2
deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper
Last synced: 14 Dec 2024
https://github.com/kmario23/deep-learning-drizzle
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
artificial-intelligence-algorithms artificial-neural-networks bayesian-statistics computer-vision deep-learning deep-neural-networks deep-reinforcement-learning explainable-ai geometric-deep-learning graph-neural-networks machine-learning medical-imaging natural-language-processing optimization pattern-recognition probabilistic-graphical-models probability reinforcement-learning speech-recognition visual-recognition
Last synced: 03 Dec 2024
https://github.com/m-bain/whisperx
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
asr speech speech-recognition speech-to-text whisper
Last synced: 16 Dec 2024
https://github.com/paddlepaddle/paddlespeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper
Last synced: 16 Dec 2024
https://github.com/PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper
Last synced: 29 Oct 2024
https://github.com/SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper
Last synced: 29 Oct 2024
https://github.com/speechbrain/speechbrain
A PyTorch-based Speech Toolkit
asr audio audio-processing deep-learning huggingface language-model pytorch speaker-diarization speaker-recognition speaker-verification speech-enhancement speech-processing speech-recognition speech-separation speech-to-text speech-toolkit speechrecognition spoken-language-understanding transformers voice-recognition
Last synced: 16 Dec 2024
https://github.com/m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
asr speech speech-recognition speech-to-text whisper
Last synced: 25 Oct 2024
https://github.com/espnet/espnet
End-to-End Speech Processing Toolkit
chainer deep-learning end-to-end kaldi machine-translation pytorch singing-voice-synthesis speaker-diarization speech-enhancement speech-recognition speech-separation speech-synthesis speech-translation spoken-language-understanding text-to-speech voice-conversion
Last synced: 16 Dec 2024
https://github.com/uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
audio python speech-recognition speech-to-text
Last synced: 16 Dec 2024
https://github.com/alphacep/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
android asr deep-learning deep-neural-networks deepspeech google-speech-to-text ios kaldi offline privacy python raspberry-pi speaker-identification speaker-verification speech-recognition speech-to-text speech-to-text-android stt voice-recognition vosk
Last synced: 16 Dec 2024
https://github.com/Uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
audio python speech-recognition speech-to-text
Last synced: 28 Oct 2024
https://github.com/nl8590687/asrt_speechrecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
asrt chinese-speech-recognition cnn ctc keras python python3 speech-recognition speech-to-text tensorflow
Last synced: 16 Dec 2024
https://github.com/nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
asrt chinese-speech-recognition cnn ctc keras python python3 speech-recognition speech-to-text tensorflow
Last synced: 31 Oct 2024
https://github.com/openvinotoolkit/openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
ai computer-vision deep-learning deploy-ai diffusion-models generative-ai good-first-issue inference llm-inference natural-language-processing nlp openvino optimize-ai performance-boost recommendation-system speech-recognition stable-diffusion transformers yolo
Last synced: 16 Dec 2024
https://github.com/modelscope/funasr
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper
Last synced: 17 Dec 2024
https://github.com/talater/annyang
💬 Speech recognition for your site
speech speech-recognition speech-to-text voice
Last synced: 16 Dec 2024
https://github.com/TalAter/annyang
:speech_balloon: Speech recognition for your site
hacktoberfest speech speech-recognition speech-to-text voice
Last synced: 25 Oct 2024
https://github.com/flashlight/wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
cpp deep-learning end-to-end speech-recognition wav2letter
Last synced: 17 Dec 2024
https://github.com/modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper
Last synced: 29 Oct 2024
https://github.com/snakers4/silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
asr capitalization colab english german onnx pretrained-models pytorch repunctuation spanish speech speech-recognition speech-synthesis speech-to-text stt stt-benchmark text-to-speech torch-hub tts tts-models
Last synced: 18 Dec 2024
https://github.com/sanchit-gandhi/whisper-jax
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
deep-learning jax speech-recognition speech-to-text whisper
Last synced: 19 Dec 2024
https://github.com/wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
asr automatic-speech-recognition conformer e2e-models production-ready pytorch speech-recognition transformer whisper
Last synced: 16 Dec 2024
https://github.com/argmaxinc/whisperkit
On-device Speech Recognition for Apple Silicon
inference ios macos speech-recognition swift transformers visionos watchos whisper
Last synced: 17 Dec 2024
https://github.com/argmaxinc/WhisperKit
On-device Speech Recognition for Apple Silicon
inference ios macos speech-recognition swift transformers visionos watchos whisper
Last synced: 31 Oct 2024
https://github.com/Picovoice/Porcupine
On-device wake word detection powered by deep learning
handsfree hotword hotword-detection hotword-detector keyword-spotter keyword-spotting on-device speech-recognition trigger-word-detection voice-activation wake-word wake-word-detection wake-word-engine
Last synced: 06 Dec 2024
https://github.com/picovoice/porcupine
On-device wake word detection powered by deep learning
handsfree hotword hotword-detection hotword-detector keyword-spotter keyword-spotting on-device speech-recognition trigger-word-detection voice-activation wake-word wake-word-detection wake-word-engine
Last synced: 21 Dec 2024
https://github.com/modelscope/funclip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
gradio gradio-python-llm llm speech-recognition speech-to-text subtitles-generator video-clip video-subtitles
Last synced: 18 Dec 2024
https://github.com/Picovoice/porcupine
On-device wake word detection powered by deep learning
handsfree hotword hotword-detection hotword-detector keyword-spotter keyword-spotting on-device speech-recognition trigger-word-detection voice-activation wake-word wake-word-detection wake-word-engine
Last synced: 27 Oct 2024
https://github.com/huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
audio speech-recognition whisper
Last synced: 17 Dec 2024
https://github.com/mahmoudashraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
asr speaker-diarization speech speech-recognition speech-to-text whisper
Last synced: 17 Dec 2024
https://github.com/MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
asr speaker-diarization speech speech-recognition speech-to-text whisper
Last synced: 31 Oct 2024
https://github.com/FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
ai aigc asr audio-event-classification cross-lingual gpt-4o llm multilingual python pytorch speech-emotion-recognition speech-recognition speech-to-text
Last synced: 14 Nov 2024
https://github.com/funaudiollm/sensevoice
Multilingual Voice Understanding Model
ai aigc asr audio-event-classification cross-lingual gpt-4o llm multilingual python pytorch speech-emotion-recognition speech-recognition speech-to-text
Last synced: 18 Dec 2024
https://github.com/yanshengjia/ml-road
Machine Learning Resources, Practice and Research
computer-vision deep-learning machine-learning nlp pytorch speech-recognition tensorflow
Last synced: 27 Nov 2024
https://github.com/zzw922cn/automatic_speech_recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset
Last synced: 20 Dec 2024
https://github.com/zzw922cn/Automatic_Speech_Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset
Last synced: 03 Nov 2024
https://github.com/toverainc/willow
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
alexa deep-learning echo esp-adf esp-idf esp32 google-home home-assistant home-automation privacy speech-recognition speech-to-text whisper
Last synced: 18 Dec 2024
https://github.com/jianchang512/stt
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
speech speech-recognition speech-to-text stt
Last synced: 19 Dec 2024
https://github.com/rhasspy/rhasspy
Offline private voice assistant for many human languages
home-assistant node-red privacy speech-recognition voice-assistants voice-commands
Last synced: 20 Dec 2024
https://github.com/mravanelli/pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
asr deep-learning deep-neural-networks dnn dnn-hmm gru kaldi lstm lstm-neural-networks multilayer-perceptron-network pytorch recurrent-neural-networks rnn rnn-model speech speech-recognition timit
Last synced: 21 Dec 2024
https://github.com/abus-aikorea/voice-pro
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.
faster-whisper gradio podcasts speech-recognition speech-synthesis speech-to-text stt subtitles text-to-speech transcription translation translator tts voice-cloning voice-conversion webui whisper yt-dlp
Last synced: 21 Dec 2024
https://github.com/coqui-ai/stt
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition
Last synced: 18 Dec 2024
https://github.com/coqui-ai/STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition
Last synced: 26 Oct 2024
https://github.com/ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
asr automatic-speech-recognition docker openai-whisper speech speech-recognition speech-to-text
Last synced: 18 Dec 2024
https://github.com/pannous/tensorflow-speech-recognition
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
deep-learning neural-network speech-recognition speech-to-text stt tensorflow
Last synced: 20 Dec 2024
https://github.com/linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
asr attention-is-all-you-need attention-mechanism attention-model attention-network attention-seq2seq attention-visualization deep-learning machine-learning multilingual-models python python3 pytorch speaker-diarization speech speech-processing speech-recognition speech-to-text transformers whisper
Last synced: 17 Dec 2024
https://github.com/alan-ai/alan-sdk-ios
Conversational AI SDK for iOS to enable text and voice conversations with actions (Swift, Objective-C)
alan-ios-sdk alan-studio alan-voice chatbot conversational-ai ios machine-learning sdk speech-recognition voice voice-ai voice-assistant voice-commands
Last synced: 19 Dec 2024
https://github.com/chenyme/chenyme-aavt
这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。
faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper
Last synced: 18 Dec 2024
https://github.com/nobody132/masr
中文语音识别; Mandarin Automatic Speech Recognition;
chinese-speech-recognition mandarin-chinese pytorch speech-recognition
Last synced: 19 Dec 2024
https://github.com/syhw/wer_are_we
Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.
deep-neural-network speech-recognition wer
Last synced: 02 Dec 2024
https://github.com/julius-speech/julius
Open-Source Large Vocabulary Continuous Speech Recognition Engine
audio-processing recognition speech speech-recognition
Last synced: 18 Dec 2024
https://github.com/astorfi/lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
3d-convolutional-network computer-vision deep-learning speech-recognition tensorflow
Last synced: 21 Dec 2024
https://github.com/alan-ai/alan-sdk-android
Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)
alan-ai alan-sdk alan-studio alan-voice android conversational-ai machine-learning multimodal sdk speech-recognition text-to-speech voice voice-assistant voice-commands voice-control voice-interface vui
Last synced: 19 Dec 2024
https://github.com/react-native-voice/voice
:microphone: React Native Voice Recognition library for iOS and Android (Online and Offline Support)
android ios react-native speech-recognition voice-recognition
Last synced: 17 Dec 2024
https://github.com/alan-ai/alan-sdk-flutter
Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)
alan-sdk alan-studio alan-voice chatbot conversational-ai flutter machine-learning multimodal sdk speech-recognition text-to-speech voice voice-ai voice-assistant voice-commands voice-control voice-interface vui
Last synced: 14 Dec 2024
https://github.com/fl33tw00d/whisper-turbo
Cross-Platform, GPU Accelerated Whisper 🏎️
audio machine-learning rust speech-recognition webgpu whisper windows
Last synced: 20 Dec 2024
https://github.com/FL33TW00D/whisper-turbo
Cross-Platform, GPU Accelerated Whisper 🏎️
audio machine-learning rust speech-recognition webgpu whisper windows
Last synced: 05 Nov 2024
https://github.com/kalliope-project/kalliope
Kalliope is a framework that will help you to create your own personal assistant.
bot bot-creation home-automation jarvis linux personal-assistant raspberry speech-recognition speech-synthesis speech-to-text
Last synced: 19 Dec 2024
https://github.com/alan-ai/alan-sdk-ionic
Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)
alan-ionic-sdk alan-studio chatbot conversational-ai ionic machine-learning multimodal sdk speech-recognition text-to-speech voice voice-ai voice-assistant voice-commands voice-control voice-interface vui
Last synced: 21 Dec 2024
https://github.com/bjoernkarmann/project_alias
Alias is a teachable “parasite” that is designed to give users more control over their smart assistants, both when it comes to customisation and privacy. Through a simple app the user can train Alias to react on a custom wake-word/sound, and once trained, Alias can take control over your home assistant by activating it for you.
alias classification hack machine-learning microphone raspberry-pi smarthome sound-synthesis speech-recognition wakeword
Last synced: 11 Oct 2024
https://github.com/Chenyme/Chenyme-AAVT
这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。
faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper
Last synced: 07 Nov 2024
https://github.com/pluja/whishper
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
ai audio-to-text golang speech-recognition speech-to-text stt subtitles sveltekit transcription ui web web-whisper webapp whisper
Last synced: 19 Dec 2024
https://github.com/Delta-ML/delta
DELTA is a deep learning based natural language and speech processing platform.
asr custom-ops deep-learning emotion-recognition front-end inference nlp nlu ops seq2seq sequence-to-sequence serving speaker-verification speech speech-recognition tensorflow tensorflow-lite tensorflow-serving text-classification text-generation
Last synced: 06 Nov 2024
https://github.com/NVIDIA/OpenSeq2Seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
deep-learning float16 language-model mixed-precision multi-gpu multi-node neural-machine-translation seq2seq sequence-to-sequence speech-recognition speech-synthesis speech-to-text tensorflow text-to-speech
Last synced: 27 Nov 2024
https://github.com/purfview/whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
asr ctranslate2 diarization faster-whisper openai speaker-diarization speech-recognition speech-to-text subtitles transcriber uvr vocal-extractor whisper whisper-faster whisperx
Last synced: 20 Dec 2024
https://github.com/dragoncomputer/dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
artificial-intelligence chatbot kaldi linux machine-learning nlp personal-assistant spacy speech-recognition speech-to-text text-to-speech ubuntu virtual-assistant
Last synced: 21 Dec 2024
https://github.com/DragonComputer/Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
artificial-intelligence chatbot kaldi linux machine-learning nlp personal-assistant spacy speech-recognition speech-to-text text-to-speech ubuntu virtual-assistant
Last synced: 07 Nov 2024
https://github.com/miteshputhran/speech-emotion-analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice
Last synced: 15 Dec 2024
https://github.com/MITESHPUTHRANNEU/Speech-Emotion-Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice
Last synced: 14 Dec 2024
https://github.com/MiteshPuthran/Speech-Emotion-Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice
Last synced: 30 Oct 2024
https://github.com/sc0ty/subsync
Subtitle Speech Synchronizer
speech-recognition subtitle-speech-synchronizer subtitles synchronization
Last synced: 14 Oct 2024
https://github.com/coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
speech-emotion-recognition speech-processing speech-recognition speech-separation speech-synthesis speech-to-text stt text-to-speech tts voice-activity-detection voice-cloning voice-recognition
Last synced: 03 Dec 2024
https://github.com/microsoft/speecht5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
speech-pretraining speech-recognition speech-synthesis speech-text-pretraining speech-translation speech2c speechlm speecht5 speechut vallex vatlm
Last synced: 15 Dec 2024
https://github.com/sdkcarlos/artyom.js
A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.
recognition speech-recognition speech-synthesis speech-to-text voice-commands
Last synced: 20 Dec 2024
https://github.com/alan-ai/alan-sdk-cordova
Conversational AI SDK for Apache Cordova to enable text and voice conversations with actions (iOS and Android)
chatbot conversational-ai machine-learning multimodal speech-recognition text-to-speech voice-assistant voice-commands voice-interface vui
Last synced: 15 Dec 2024
https://github.com/mravanelli/sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform
Last synced: 15 Dec 2024
https://github.com/mravanelli/SincNet
SincNet is a neural architecture for efficiently processing raw audio samples.
artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform
Last synced: 11 Nov 2024
https://github.com/k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
asr c cpp csharp go kotlin python speech-recognition vad voice-activity-detection
Last synced: 18 Dec 2024
https://github.com/bytedance/salmonn
SALMONN: Speech Audio Language Music Open Neural Network
audio audio-processing bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university
Last synced: 20 Dec 2024
https://github.com/alumae/kaldi-gstreamer-server
Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
Last synced: 20 Dec 2024
https://github.com/bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
audio audio-processing bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university
Last synced: 08 Nov 2024
https://github.com/modal-labs/quillman
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
ai language-model python serverless speech-recognition speech-to-text
Last synced: 27 Oct 2024
https://github.com/pykaldi/pykaldi
A Python wrapper for Kaldi
asr clif feature-extraction kaldi language-model numpy openfst python speech speech-recognition wrapper
Last synced: 20 Dec 2024
https://github.com/ictnlp/streamspeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
all-in-one asr audio-processing machine-translation non-autoregressive seamless simultaneous-translation speech speech-enhancement speech-processing speech-recognition speech-synthesis speech-to-text speech-translation streaming-audio text-to-audio text-to-speech translation tts voice
Last synced: 20 Dec 2024
https://github.com/sooftware/conformer
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
asr augmented cnn conformer conv convolution pytorch recognition speech speech-recognition transformer transformer-xl
Last synced: 16 Dec 2024
https://github.com/lhotse-speech/lhotse
Tools for handling speech data in machine learning projects.
ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition
Last synced: 28 Nov 2024
https://github.com/athena-team/athena
an open-source implementation of sequence-to-sequence based speech processing engine
asr ctc deployment sequence-to-sequence speaker-recognition speech-recognition speech-synthesis tensorflow transformer tts unsupervised-learning wfst
Last synced: 28 Nov 2024