Projects in Awesome Lists tagged with speech-recognition

https://github.com/huggingface/transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

bert deep-learning flax hacktoberfest jax language-model language-models machine-learning model-hub natural-language-processing nlp nlp-library pretrained-models python pytorch pytorch-transformers seq2seq speech-recognition tensorflow transformer

Last synced: 16 Dec 2024

https://github.com/ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

inference openai speech-recognition speech-to-text transformer whisper

Last synced: 16 Dec 2024

https://github.com/mozilla/deepspeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Last synced: 16 Dec 2024

https://github.com/mozilla/DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Last synced: 25 Oct 2024

https://github.com/mozilla/stt

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Last synced: 13 Oct 2024

https://github.com/leon-ai/leon

🧠 Leon is your open-source personal assistant.

ai ai-assistant artificial-intelligence assistant automation bot chatbot flite leon nodejs offline personal-assistant privacy python speech-recognition speech-synthesis speech-to-text text-to-speech virtual-assistant voice-assistant

Last synced: 16 Dec 2024

https://github.com/kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

c-plus-plus cuda kaldi shell speaker-id speaker-verification speech speech-recognition speech-to-text

Last synced: 16 Dec 2024

https://github.com/nvidia/deeplearningexamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

computer-vision deep-learning drug-discovery forecasting large-language-models mxnet nlp paddlepaddle pytorch recommender-systems speech-recognition speech-synthesis tensorflow tensorflow2 translation

Last synced: 16 Dec 2024

https://github.com/NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

computer-vision deep-learning drug-discovery forecasting large-language-models mxnet nlp paddlepaddle pytorch recommender-systems speech-recognition speech-synthesis tensorflow tensorflow2 translation

Last synced: 27 Oct 2024

https://github.com/systran/faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper

Last synced: 16 Dec 2024

https://github.com/guillaumekln/faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper

Last synced: 14 Dec 2024

https://github.com/kmario23/deep-learning-drizzle

Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!

artificial-intelligence-algorithms artificial-neural-networks bayesian-statistics computer-vision deep-learning deep-neural-networks deep-reinforcement-learning explainable-ai geometric-deep-learning graph-neural-networks machine-learning medical-imaging natural-language-processing optimization pattern-recognition probabilistic-graphical-models probability reinforcement-learning speech-recognition visual-recognition

Last synced: 03 Dec 2024

https://github.com/m-bain/whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text whisper

Last synced: 16 Dec 2024

https://github.com/paddlepaddle/paddlespeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper

Last synced: 16 Dec 2024

https://github.com/PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper

Last synced: 29 Oct 2024

https://github.com/SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper

Last synced: 29 Oct 2024

https://github.com/speechbrain/speechbrain

A PyTorch-based Speech Toolkit

asr audio audio-processing deep-learning huggingface language-model pytorch speaker-diarization speaker-recognition speaker-verification speech-enhancement speech-processing speech-recognition speech-separation speech-to-text speech-toolkit speechrecognition spoken-language-understanding transformers voice-recognition

Last synced: 16 Dec 2024

https://github.com/m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text whisper

Last synced: 25 Oct 2024

https://github.com/espnet/espnet

End-to-End Speech Processing Toolkit

chainer deep-learning end-to-end kaldi machine-translation pytorch singing-voice-synthesis speaker-diarization speech-enhancement speech-recognition speech-separation speech-synthesis speech-translation spoken-language-understanding text-to-speech voice-conversion

Last synced: 16 Dec 2024

https://github.com/uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

audio python speech-recognition speech-to-text

Last synced: 16 Dec 2024

https://github.com/alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

android asr deep-learning deep-neural-networks deepspeech google-speech-to-text ios kaldi offline privacy python raspberry-pi speaker-identification speaker-verification speech-recognition speech-to-text speech-to-text-android stt voice-recognition vosk

Last synced: 16 Dec 2024

https://github.com/Uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

audio python speech-recognition speech-to-text

Last synced: 28 Oct 2024

https://github.com/nl8590687/asrt_speechrecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

asrt chinese-speech-recognition cnn ctc keras python python3 speech-recognition speech-to-text tensorflow

Last synced: 16 Dec 2024

https://github.com/nl8590687/ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

asrt chinese-speech-recognition cnn ctc keras python python3 speech-recognition speech-to-text tensorflow

Last synced: 31 Oct 2024

https://github.com/openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

ai computer-vision deep-learning deploy-ai diffusion-models generative-ai good-first-issue inference llm-inference natural-language-processing nlp openvino optimize-ai performance-boost recommendation-system speech-recognition stable-diffusion transformers yolo

Last synced: 16 Dec 2024

https://github.com/modelscope/funasr

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper

Last synced: 17 Dec 2024

https://github.com/talater/annyang

💬 Speech recognition for your site

speech speech-recognition speech-to-text voice

Last synced: 16 Dec 2024

https://github.com/TalAter/annyang

:speech_balloon: Speech recognition for your site

hacktoberfest speech speech-recognition speech-to-text voice

Last synced: 25 Oct 2024

https://github.com/flashlight/wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

cpp deep-learning end-to-end speech-recognition wav2letter

Last synced: 17 Dec 2024

https://github.com/modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper

Last synced: 29 Oct 2024

https://github.com/snakers4/silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

asr capitalization colab english german onnx pretrained-models pytorch repunctuation spanish speech speech-recognition speech-synthesis speech-to-text stt stt-benchmark text-to-speech torch-hub tts tts-models

Last synced: 18 Dec 2024

https://github.com/sanchit-gandhi/whisper-jax

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

deep-learning jax speech-recognition speech-to-text whisper

Last synced: 19 Dec 2024

https://github.com/wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

asr automatic-speech-recognition conformer e2e-models production-ready pytorch speech-recognition transformer whisper

Last synced: 16 Dec 2024

https://github.com/argmaxinc/whisperkit

On-device Speech Recognition for Apple Silicon

inference ios macos speech-recognition swift transformers visionos watchos whisper

Last synced: 17 Dec 2024

https://github.com/cmusphinx/pocketsphinx

A small speech recognizer

c python speech-recognition

Last synced: 16 Dec 2024

https://github.com/argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

inference ios macos speech-recognition swift transformers visionos watchos whisper

Last synced: 31 Oct 2024

https://github.com/Picovoice/Porcupine

On-device wake word detection powered by deep learning

handsfree hotword hotword-detection hotword-detector keyword-spotter keyword-spotting on-device speech-recognition trigger-word-detection voice-activation wake-word wake-word-detection wake-word-engine

Last synced: 06 Dec 2024

https://github.com/picovoice/porcupine

On-device wake word detection powered by deep learning

handsfree hotword hotword-detection hotword-detector keyword-spotter keyword-spotting on-device speech-recognition trigger-word-detection voice-activation wake-word wake-word-detection wake-word-engine

Last synced: 21 Dec 2024

https://github.com/modelscope/funclip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

gradio gradio-python-llm llm speech-recognition speech-to-text subtitles-generator video-clip video-subtitles

Last synced: 18 Dec 2024

https://github.com/Picovoice/porcupine

On-device wake word detection powered by deep learning

handsfree hotword hotword-detection hotword-detector keyword-spotter keyword-spotting on-device speech-recognition trigger-word-detection voice-activation wake-word wake-word-detection wake-word-engine

Last synced: 27 Oct 2024

https://github.com/huggingface/distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

audio speech-recognition whisper

Last synced: 17 Dec 2024

https://github.com/mahmoudashraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text whisper

Last synced: 17 Dec 2024

https://github.com/MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text whisper

Last synced: 31 Oct 2024

https://github.com/FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

ai aigc asr audio-event-classification cross-lingual gpt-4o llm multilingual python pytorch speech-emotion-recognition speech-recognition speech-to-text

Last synced: 14 Nov 2024

https://github.com/funaudiollm/sensevoice

Multilingual Voice Understanding Model

ai aigc asr audio-event-classification cross-lingual gpt-4o llm multilingual python pytorch speech-emotion-recognition speech-recognition speech-to-text

Last synced: 18 Dec 2024

https://github.com/yanshengjia/ml-road

Machine Learning Resources, Practice and Research

computer-vision deep-learning machine-learning nlp pytorch speech-recognition tensorflow

Last synced: 27 Nov 2024

https://github.com/zzw922cn/automatic_speech_recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset

Last synced: 20 Dec 2024

https://github.com/zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset

Last synced: 03 Nov 2024

https://github.com/tensorflow/lingvo

Lingvo

asr distributed gpu-computing language-model lm machine-translation mnist nlp research seq2seq speech speech-recognition speech-synthesis speech-to-text tensorflow translation tts

Last synced: 17 Dec 2024

https://github.com/toverainc/willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

alexa deep-learning echo esp-adf esp-idf esp32 google-home home-assistant home-automation privacy speech-recognition speech-to-text whisper

Last synced: 18 Dec 2024

https://github.com/jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

speech speech-recognition speech-to-text stt

Last synced: 19 Dec 2024

https://github.com/rhasspy/rhasspy

Offline private voice assistant for many human languages

home-assistant node-red privacy speech-recognition voice-assistants voice-commands

Last synced: 20 Dec 2024

https://github.com/mravanelli/pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

asr deep-learning deep-neural-networks dnn dnn-hmm gru kaldi lstm lstm-neural-networks multilayer-perceptron-network pytorch recurrent-neural-networks rnn rnn-model speech speech-recognition timit

Last synced: 21 Dec 2024

https://github.com/abus-aikorea/voice-pro

Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.

faster-whisper gradio podcasts speech-recognition speech-synthesis speech-to-text stt subtitles text-to-speech transcription translation translator tts voice-cloning voice-conversion webui whisper yt-dlp

Last synced: 21 Dec 2024

https://github.com/coqui-ai/stt

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 18 Dec 2024

https://github.com/coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 26 Oct 2024

https://github.com/ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

asr automatic-speech-recognition docker openai-whisper speech speech-recognition speech-to-text

Last synced: 18 Dec 2024

https://github.com/pannous/tensorflow-speech-recognition

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

deep-learning neural-network speech-recognition speech-to-text stt tensorflow

Last synced: 20 Dec 2024

https://github.com/linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

asr attention-is-all-you-need attention-mechanism attention-model attention-network attention-seq2seq attention-visualization deep-learning machine-learning multilingual-models python python3 pytorch speaker-diarization speech speech-processing speech-recognition speech-to-text transformers whisper

Last synced: 17 Dec 2024

https://github.com/alan-ai/alan-sdk-ios

Conversational AI SDK for iOS to enable text and voice conversations with actions (Swift, Objective-C)

alan-ios-sdk alan-studio alan-voice chatbot conversational-ai ios machine-learning sdk speech-recognition voice voice-ai voice-assistant voice-commands

Last synced: 19 Dec 2024

https://github.com/chenyme/chenyme-aavt

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper

Last synced: 18 Dec 2024

https://github.com/nobody132/masr

中文语音识别; Mandarin Automatic Speech Recognition;

chinese-speech-recognition mandarin-chinese pytorch speech-recognition

Last synced: 19 Dec 2024

https://github.com/syhw/wer_are_we

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

deep-neural-network speech-recognition wer

Last synced: 02 Dec 2024

https://github.com/julius-speech/julius

Open-Source Large Vocabulary Continuous Speech Recognition Engine

audio-processing recognition speech speech-recognition

Last synced: 18 Dec 2024

https://github.com/astorfi/lip-reading-deeplearning

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

3d-convolutional-network computer-vision deep-learning speech-recognition tensorflow

Last synced: 21 Dec 2024

https://github.com/alan-ai/alan-sdk-android

Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)

alan-ai alan-sdk alan-studio alan-voice android conversational-ai machine-learning multimodal sdk speech-recognition text-to-speech voice voice-assistant voice-commands voice-control voice-interface vui

Last synced: 19 Dec 2024

https://github.com/react-native-voice/voice

:microphone: React Native Voice Recognition library for iOS and Android (Online and Offline Support)

android ios react-native speech-recognition voice-recognition

Last synced: 17 Dec 2024

https://github.com/alan-ai/alan-sdk-flutter

Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)

alan-sdk alan-studio alan-voice chatbot conversational-ai flutter machine-learning multimodal sdk speech-recognition text-to-speech voice voice-ai voice-assistant voice-commands voice-control voice-interface vui

Last synced: 14 Dec 2024

https://github.com/fl33tw00d/whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️

audio machine-learning rust speech-recognition webgpu whisper windows

Last synced: 20 Dec 2024

https://github.com/FL33TW00D/whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️

audio machine-learning rust speech-recognition webgpu whisper windows

Last synced: 05 Nov 2024

https://github.com/kalliope-project/kalliope

Kalliope is a framework that will help you to create your own personal assistant.

bot bot-creation home-automation jarvis linux personal-assistant raspberry speech-recognition speech-synthesis speech-to-text

Last synced: 19 Dec 2024

https://github.com/alan-ai/alan-sdk-ionic

Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)

alan-ionic-sdk alan-studio chatbot conversational-ai ionic machine-learning multimodal sdk speech-recognition text-to-speech voice voice-ai voice-assistant voice-commands voice-control voice-interface vui

Last synced: 21 Dec 2024

https://github.com/bjoernkarmann/project_alias

Alias is a teachable “parasite” that is designed to give users more control over their smart assistants, both when it comes to customisation and privacy. Through a simple app the user can train Alias to react on a custom wake-word/sound, and once trained, Alias can take control over your home assistant by activating it for you.

alias classification hack machine-learning microphone raspberry-pi smarthome sound-synthesis speech-recognition wakeword

Last synced: 11 Oct 2024

https://github.com/Chenyme/Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper

Last synced: 07 Nov 2024

https://github.com/pluja/whishper

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!

ai audio-to-text golang speech-recognition speech-to-text stt subtitles sveltekit transcription ui web web-whisper webapp whisper

Last synced: 19 Dec 2024

https://github.com/Delta-ML/delta

DELTA is a deep learning based natural language and speech processing platform.

asr custom-ops deep-learning emotion-recognition front-end inference nlp nlu ops seq2seq sequence-to-sequence serving speaker-verification speech speech-recognition tensorflow tensorflow-lite tensorflow-serving text-classification text-generation

Last synced: 06 Nov 2024

https://github.com/NVIDIA/OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

deep-learning float16 language-model mixed-precision multi-gpu multi-node neural-machine-translation seq2seq sequence-to-sequence speech-recognition speech-synthesis speech-to-text tensorflow text-to-speech

Last synced: 27 Nov 2024

https://github.com/purfview/whisper-standalone-win

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

asr ctranslate2 diarization faster-whisper openai speaker-diarization speech-recognition speech-to-text subtitles transcriber uvr vocal-extractor whisper whisper-faster whisperx

Last synced: 20 Dec 2024

https://github.com/dragoncomputer/dragonfire

the open-source virtual assistant for Ubuntu based Linux distributions

artificial-intelligence chatbot kaldi linux machine-learning nlp personal-assistant spacy speech-recognition speech-to-text text-to-speech ubuntu virtual-assistant

Last synced: 21 Dec 2024

https://github.com/DragonComputer/Dragonfire

the open-source virtual assistant for Ubuntu based Linux distributions

artificial-intelligence chatbot kaldi linux machine-learning nlp personal-assistant spacy speech-recognition speech-to-text text-to-speech ubuntu virtual-assistant

Last synced: 07 Nov 2024

https://github.com/miteshputhran/speech-emotion-analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice

Last synced: 15 Dec 2024

https://github.com/MITESHPUTHRANNEU/Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice

Last synced: 14 Dec 2024

https://github.com/MiteshPuthran/Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice

Last synced: 30 Oct 2024

https://github.com/sc0ty/subsync

Subtitle Speech Synchronizer

speech-recognition subtitle-speech-synchronizer subtitles synchronization

Last synced: 14 Oct 2024

https://github.com/coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

speech-emotion-recognition speech-processing speech-recognition speech-separation speech-synthesis speech-to-text stt text-to-speech tts voice-activity-detection voice-cloning voice-recognition

Last synced: 03 Dec 2024

https://github.com/microsoft/speecht5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

speech-pretraining speech-recognition speech-synthesis speech-text-pretraining speech-translation speech2c speechlm speecht5 speechut vallex vatlm

Last synced: 15 Dec 2024

https://github.com/sdkcarlos/artyom.js

A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.

recognition speech-recognition speech-synthesis speech-to-text voice-commands

Last synced: 20 Dec 2024

https://github.com/alan-ai/alan-sdk-cordova

Conversational AI SDK for Apache Cordova to enable text and voice conversations with actions (iOS and Android)

chatbot conversational-ai machine-learning multimodal speech-recognition text-to-speech voice-assistant voice-commands voice-interface vui

Last synced: 15 Dec 2024

https://github.com/mravanelli/sincnet

SincNet is a neural architecture for efficiently processing raw audio samples.

artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform

Last synced: 15 Dec 2024

https://github.com/mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform

Last synced: 11 Nov 2024

https://github.com/k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

asr c cpp csharp go kotlin python speech-recognition vad voice-activity-detection

Last synced: 18 Dec 2024

https://github.com/bytedance/salmonn

SALMONN: Speech Audio Language Music Open Neural Network

audio audio-processing bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university

Last synced: 20 Dec 2024

https://github.com/alumae/kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

speech-recognition

Last synced: 20 Dec 2024

https://github.com/bytedance/SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

audio audio-processing bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university

Last synced: 08 Nov 2024

https://github.com/modal-labs/quillman

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

ai language-model python serverless speech-recognition speech-to-text