Projects in Awesome Lists tagged with speech

https://github.com/babysor/mockingbird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

ai deep-learning pytorch speech text-to-speech tts

Last synced: 16 Dec 2024

https://github.com/babysor/MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

ai deep-learning pytorch speech text-to-speech tts

Last synced: 27 Oct 2024

https://github.com/coqui-ai/tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

deep-learning glow-tts hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis tacotron text-to-speech tts tts-model vocoder voice-cloning voice-conversion voice-synthesis

Last synced: 16 Dec 2024

https://github.com/coqui-ai/TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

deep-learning glow-tts hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis tacotron text-to-speech tts tts-model vocoder voice-cloning voice-conversion voice-synthesis

Last synced: 25 Oct 2024

https://github.com/svc-develop-team/so-vits-svc

SoftVC VITS Singing Voice Conversion

ai audio-analysis deep-learning flow generative-adversarial-network pytorch singing-voice-conversion so-vits-svc sovits speech variational-inference vc vits voice voice-changer voice-conversion voiceconversion

Last synced: 29 Sep 2024

https://github.com/huggingface/datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

computer-vision datasets deep-learning hacktoberfest machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

Last synced: 16 Dec 2024

https://github.com/idea-research/grounded-segment-anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

3d-whole-body-pose-estimation automatic-labeling-system caption data-generation image-editing open-vocabulary-detection open-vocabulary-segmentation speech

Last synced: 16 Dec 2024

https://github.com/IDEA-Research/Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

3d-whole-body-pose-estimation automatic-labeling-system caption data-generation image-editing open-vocabulary-detection open-vocabulary-segmentation speech

Last synced: 27 Oct 2024

https://github.com/kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

c-plus-plus cuda kaldi shell speaker-id speaker-verification speech speech-recognition speech-to-text

Last synced: 16 Dec 2024

https://github.com/m-bain/whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text whisper

Last synced: 16 Dec 2024

https://github.com/aigc-audio/audiogpt

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

audio gpt music sound speech talking-head

Last synced: 17 Dec 2024

https://github.com/AIGC-Audio/AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

audio gpt music sound speech talking-head

Last synced: 29 Oct 2024

https://github.com/mozilla/tts

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

dataset-analysis deep-learning gantts glow-tts melgan multiband-melgan python pytorch speaker-encoder speech tacotron tacotron2 tensorflow2 text-to-speech tts vocoder

Last synced: 17 Dec 2024

https://github.com/mozilla/TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

dataset-analysis deep-learning gantts glow-tts melgan multiband-melgan python pytorch speaker-encoder speech tacotron tacotron2 tensorflow2 text-to-speech tts vocoder

Last synced: 25 Oct 2024

https://github.com/m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text whisper

Last synced: 25 Oct 2024

https://github.com/netease-youdao/emotivoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

ai deep-learning emotion emotivoice multi-speaker prompt python pytorch speech speech-synthesis style text-to-speech tts

Last synced: 16 Dec 2024

https://github.com/netease-youdao/EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

ai deep-learning emotion emotivoice multi-speaker prompt python pytorch speech speech-synthesis style text-to-speech tts

Last synced: 29 Oct 2024

https://github.com/modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

cv deep-learning machine-learning multi-modal nlp python science speech

Last synced: 16 Dec 2024

https://github.com/paddlepaddle/models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.

computer-vision cv deep-learning models natural-language-processing neural-network nlp paddlepaddle recommendation speech

Last synced: 17 Dec 2024

https://github.com/PaddlePaddle/models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.

computer-vision cv deep-learning models natural-language-processing neural-network nlp paddlepaddle recommendation speech

Last synced: 28 Oct 2024

https://github.com/talater/annyang

💬 Speech recognition for your site

speech speech-recognition speech-to-text voice

Last synced: 16 Dec 2024

https://github.com/TalAter/annyang

:speech_balloon: Speech recognition for your site

hacktoberfest speech speech-recognition speech-to-text voice

Last synced: 25 Oct 2024

https://github.com/snakers4/silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

asr capitalization colab english german onnx pretrained-models pytorch repunctuation spanish speech speech-recognition speech-synthesis speech-to-text stt stt-benchmark text-to-speech torch-hub tts tts-models

Last synced: 18 Dec 2024

https://github.com/snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

onnx onnx-runtime onnxruntime pytorch speech speech-processing vad voice-activity-detection voice-commands voice-control voice-detection voice-recognition

Last synced: 18 Dec 2024

https://github.com/metavoiceio/metavoice-src

Foundational model for human-like, expressive TTS

ai deep-learning pytorch speech speech-synthesis text-to-speech tts voice-clone zero-shot-tts

Last synced: 17 Dec 2024

https://github.com/MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text whisper

Last synced: 31 Oct 2024

https://github.com/mahmoudashraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text whisper

Last synced: 17 Dec 2024

https://github.com/shu223/iOS-10-Sampler

Code examples for new APIs of iOS 10.

cnn convolutional-neural-networks demo image-recognition ios ios10 metal metal-cnn metal-performance-shaders speech swift-3 swift-4 uiviewpropertyanimator

Last synced: 24 Nov 2024

https://github.com/shu223/ios-10-sampler

Code examples for new APIs of iOS 10.

cnn convolutional-neural-networks demo image-recognition ios ios10 metal metal-cnn metal-performance-shaders speech swift-3 swift-4 uiviewpropertyanimator

Last synced: 20 Dec 2024

https://github.com/huggingface/speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

ai assistant language-model machine-learning python speech speech-synthesis speech-to-text speech-translation

Last synced: 07 Sep 2024

https://github.com/tensorflow/lingvo

Lingvo

asr distributed gpu-computing language-model lm machine-translation mnist nlp research seq2seq speech speech-recognition speech-synthesis speech-to-text tensorflow translation tts

Last synced: 17 Dec 2024

https://github.com/avinashkranjan/amazing-python-scripts

🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

artificial-intelligence hacktoberfest machine-learning projects python python-projects python-scripts speech webcam

Last synced: 18 Dec 2024

https://github.com/hahahumble/speechgpt

💬 SpeechGPT is a web application that enables you to converse with ChatGPT.

chat chatbot chatgpt conversation language-learning speech

Last synced: 20 Dec 2024

https://github.com/avinashkranjan/Amazing-Python-Scripts

🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

artificial-intelligence hacktoberfest machine-learning projects python python-projects python-scripts speech webcam

Last synced: 27 Oct 2024

https://github.com/jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

speech speech-recognition speech-to-text stt

Last synced: 19 Dec 2024

https://github.com/rikorose/deepfilternet

Noise supression using deep filtering

audio deep-learning noise-suppression pytorch rust speech speech-enhancement

Last synced: 17 Dec 2024

https://github.com/pytorch/audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

audio audio-processing io machine-learning python pytorch speech

Last synced: 21 Dec 2024

https://github.com/camb-ai/mars5-tts

MARS5 speech model (TTS) from CAMB.AI

prosody speech speech-synthesis text-to-speech voice-cloneai voice-cloning

Last synced: 19 Dec 2024

https://github.com/readbeyond/aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

alignment audio cli dtw espeak espeak-ng festival ffmpeg forced-alignment linux macos nlp python smil speech srt text text-to-speech tts windows

Last synced: 17 Dec 2024

https://github.com/Rikorose/DeepFilterNet

Noise supression using deep filtering

audio deep-learning noise-suppression pytorch rust speech speech-enhancement

Last synced: 06 Nov 2024

https://github.com/mravanelli/pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

asr deep-learning deep-neural-networks dnn dnn-hmm gru kaldi lstm lstm-neural-networks multilayer-perceptron-network pytorch recurrent-neural-networks rnn rnn-model speech speech-recognition timit

Last synced: 21 Dec 2024

https://github.com/r9y9/wavenet_vocoder

WaveNet vocoder

neural-vocoder python pytorch speech speech-processing speech-synthesis wavenet wavenet-vocoder

Last synced: 20 Dec 2024

https://github.com/pndurette/gtts

Python library and CLI tool to interface with Google Translate's text-to-speech API

cli gtts pypi python python-library speech speech-api text-to-speech tts

Last synced: 16 Dec 2024

https://github.com/ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

asr automatic-speech-recognition docker openai-whisper speech speech-recognition speech-to-text

Last synced: 18 Dec 2024

https://github.com/pndurette/gTTS

Python library and CLI tool to interface with Google Translate's text-to-speech API

cli gtts pypi python python-library speech speech-api text-to-speech tts

Last synced: 25 Oct 2024

https://github.com/linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

asr attention-is-all-you-need attention-mechanism attention-model attention-network attention-seq2seq attention-visualization deep-learning machine-learning multilingual-models python python3 pytorch speaker-diarization speech speech-processing speech-recognition speech-to-text transformers whisper

Last synced: 17 Dec 2024

https://github.com/julius-speech/julius

Open-Source Large Vocabulary Continuous Speech Recognition Engine

audio-processing recognition speech speech-recognition

Last synced: 18 Dec 2024

https://github.com/Kyubyong/tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

speech speech-synthesis-model tensorflow tts

Last synced: 27 Nov 2024

https://github.com/kyubyong/tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

speech speech-synthesis-model tensorflow tts

Last synced: 21 Dec 2024

https://github.com/jarikomppa/soloud

Free, easy, portable audio engine for games

audio blitzmax c cpp engine flac game game-development gamemaker mp3 ogg opensl-es portable python ruby sound sound-effects speech speech-to-text synthesizer

Last synced: 19 Dec 2024

https://github.com/iahispano/applio

A simple, high-quality voice conversion tool focused on ease of use and performance

ai applio pytorch rvc speech speech-to-speech text-to-speech tts vc vits voice voice-clone voice-cloning voice-conversion

Last synced: 19 Dec 2024

https://github.com/IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance

ai applio pytorch rvc speech speech-to-speech text-to-speech tts vc vits voice voice-clone voice-cloning voice-conversion

Last synced: 14 Nov 2024

https://github.com/ovidijusparsiunas/deep-chat

Fully customizable AI chatbot component for your website

ai ai-chatbot angular chat chatbot chatgpt cohere component files huggingface image nextjs openai react react-chatbot solid speech svelte vue

Last synced: 17 Dec 2024

https://github.com/Delta-ML/delta

DELTA is a deep learning based natural language and speech processing platform.

asr custom-ops deep-learning emotion-recognition front-end inference nlp nlu ops seq2seq sequence-to-sequence serving speaker-verification speech speech-recognition tensorflow tensorflow-lite tensorflow-serving text-classification text-generation

Last synced: 06 Nov 2024

https://github.com/praat/praat

Praat: Doing Phonetics By Computer

acoustics phonetics speech speech-analysis

Last synced: 19 Dec 2024

https://github.com/csteinmetz1/ai-audio-startups

Community list of startups working with AI in audio and music technology

audio list music speech startups

Last synced: 03 Dec 2024

https://github.com/OvidijusParsiunas/deep-chat

Fully customizable AI chatbot component for your website

ai ai-chatbot angular chat chatbot chatgpt cohere component files huggingface image nextjs openai react react-chatbot solid speech svelte vue

Last synced: 06 Nov 2024

https://github.com/miteshputhran/speech-emotion-analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice

Last synced: 15 Dec 2024

https://github.com/MITESHPUTHRANNEU/Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice

Last synced: 14 Dec 2024

https://github.com/MiteshPuthran/Speech-Emotion-Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

audio-files data-science deep-learning deep-neural-networks emotion emotion-recognition keras natural-language-processing natural-language-understanding neural-network python3 speech speech-emotion-recognition speech-recognition voice

Last synced: 30 Oct 2024

https://github.com/dengbocong/nlp-paper

自然语言处理领域下的相关论文（附阅读笔记），复现模型以及数据处理等（代码含TensorFlow和PyTorch两版本）

bert dialogue nlp nlp-machine-learning paper pytorch speech tensorflow2

Last synced: 21 Dec 2024

https://github.com/DengBoCong/nlp-paper

自然语言处理领域下的相关论文（附阅读笔记），复现模型以及数据处理等（代码含TensorFlow和PyTorch两版本）

bert dialogue nlp nlp-machine-learning paper pytorch speech tensorflow2

Last synced: 14 Nov 2024

https://github.com/kyubyong/dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

speech speech-to-text tts

Last synced: 15 Dec 2024

https://github.com/Kyubyong/dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

speech speech-to-text tts

Last synced: 07 Nov 2024

https://github.com/roatienza/deep-learning-experiments

Videos, notes and experiments to understand deep learning

artificial-intelligence deep-learning deep-learning-tutorial nlp pytorch speech vision

Last synced: 19 Dec 2024

https://github.com/roatienza/Deep-Learning-Experiments

Videos, notes and experiments to understand deep learning

artificial-intelligence deep-learning deep-learning-tutorial nlp pytorch speech vision

Last synced: 30 Oct 2024

https://github.com/bytedance/salmonn

SALMONN: Speech Audio Language Music Open Neural Network

audio audio-processing bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university

Last synced: 20 Dec 2024

https://github.com/haoheliu/voicefixer

General Speech Restoration

declipping denoise dereverberation mel speech speech-analysis speech-enhancement speech-processing speech-synthesis super-resolution tts vocoder

Last synced: 17 Dec 2024

https://github.com/bytedance/SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

audio audio-processing bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university

Last synced: 08 Nov 2024

https://github.com/pykaldi/pykaldi

A Python wrapper for Kaldi

asr clif feature-extraction kaldi language-model numpy openfst python speech speech-recognition wrapper

Last synced: 20 Dec 2024

https://github.com/ictnlp/streamspeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

all-in-one asr audio-processing machine-translation non-autoregressive seamless simultaneous-translation speech speech-enhancement speech-processing speech-recognition speech-synthesis speech-to-text speech-translation streaming-audio text-to-audio text-to-speech translation tts voice

Last synced: 20 Dec 2024

https://github.com/sooftware/conformer

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

asr augmented cnn conformer conv convolution pytorch recognition speech speech-recognition transformer transformer-xl

Last synced: 16 Dec 2024

https://github.com/NATSpeech/NATSpeech

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

diffsinger diffspeech huggingface portaspeech pytorch speech speech-synthesis tts

Last synced: 27 Nov 2024

https://github.com/lhotse-speech/lhotse

Tools for handling speech data in machine learning projects.

ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition

Last synced: 28 Nov 2024

https://github.com/jtkim-kaist/VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

acam attention bdnn data dnn lstm speech speech-activity-detection speech-recognition vad voice-activity-detection voice-detection

Last synced: 14 Nov 2024

https://github.com/yeyupiaoling/ppasr

基于PaddlePaddle实现端到端中文语音识别，从入门到实战，超简单的入门案例，超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型

asr chinese conformer deep-learning deepspeech2 paddlepaddle speech speech-recognition speech-to-text squeezeformer streaming-asr

Last synced: 19 Dec 2024

https://github.com/yeyupiaoling/PPASR

基于PaddlePaddle实现端到端中文语音识别，从入门到实战，超简单的入门案例，超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型

asr chinese conformer deep-learning deepspeech2 paddlepaddle speech speech-recognition speech-to-text squeezeformer streaming-asr

Last synced: 14 Nov 2024

https://github.com/santi-pdp/segan

Speech Enhancement Generative Adversarial Network in TensorFlow

deep-learning deep-neural-networks gan generative-adversarial-networks generative-model speech tensorflow

Last synced: 22 Nov 2024

https://github.com/EvelynFan/FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers

3d-face 3d-models computer-graphics computer-vision deep-learning facial-animation facial-expressions lip-animation pytorch-implementation speech

Last synced: 07 Nov 2024

https://github.com/goxr3plus/xr3player

🎧 🎼 The MOST ADVANCED JavaFX Media Player

audio-formats audio-player audio-processing audio-recorder audio-visualizer dropbox-client java-speech java-stream-player javafx mp3 spectrum-analyzer speech stream-player web-browser

Last synced: 20 Dec 2024

https://github.com/googleapis/nodejs-speech

This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.

machine-learning nodejs speech speech-to-text

Last synced: 25 Oct 2024

https://github.com/drethage/speech-denoising-wavenet

A neural network for end-to-end speech denoising

deep-learning end-to-end machine-learning neural-networks speech speech-denoising speech-processing wavenet

Last synced: 22 Nov 2024

https://github.com/demiseom/specaugment

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

data-augmentation python pytorch specaugment speech speech-recognition tensorflow

Last synced: 20 Dec 2024

https://github.com/DemisEom/SpecAugment

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

data-augmentation python pytorch specaugment speech speech-recognition tensorflow

Last synced: 27 Nov 2024

https://github.com/cboard-org/cboard

Augmentative and Alternative Communication (AAC) system with text-to-speech for the browser

aac accessibility assistive-technology autism cerebral-palsy communication communication-board disabilities javascript progressive-web-app react speech symbols text-to-speech tts

Last synced: 25 Oct 2024

https://github.com/coqui-ai/TTS-papers

🐸 collection of TTS papers

coqui-ai deep-learning papers research-paper speech tts

Last synced: 16 Nov 2024

https://github.com/evancohen/sonus

:speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection

alexa hotword-detection keyword-spotting node speech speech-recognition speech-to-text stt voice-control voice-recognition

Last synced: 19 Dec 2024

https://github.com/yeyupiaoling/masr

Pytorch实现的流式与非流式的自动语音识别框架，同时兼容在线和离线识别，目前支持Conformer、Squeezeformer、DeepSpeech2模型，支持多种数据增强方法。

asr conformer deep-learning deepspeech pytorch speech speech-recognition speech-to-text squeezeformer

Last synced: 19 Dec 2024

https://github.com/vbelz/Speech-enhancement

Deep learning for audio denoising

cnn deep-learning speech unet

Last synced: 06 Nov 2024

https://github.com/hirofumi0810/neural_sp

End-to-end ASR/LM implementation with PyTorch

asr attention attention-mechanism automatic-speech-recognition ctc language-model language-modeling pytorch rnn-transducer seq2seq sequence-to-sequence speech speech-recognition streaming transformer transformer-xl

Last synced: 12 Nov 2024

https://github.com/OlaWod/FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

pytorch speech voice-conversion

Last synced: 18 Nov 2024

https://github.com/azkadev/whisper

Whisper Dart is a cross platform library for dart and flutter that allows converting audio to text / speech to text / inference from Open AI models

ai android dart flutter ggml indonesia ios linux macos openai speech speech-recognition speech-synthesis speech-to-text transcribe transformer whisper whisper-dart whisper-flutter windows

Last synced: 21 Dec 2024

https://github.com/coqui-ai/tts-papers

🐸 collection of TTS papers

coqui-ai deep-learning papers research-paper speech tts

Last synced: 10 Nov 2024

https://github.com/xinjli/allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

phonetics pytorch speech speech-recognition

Last synced: 12 Oct 2024

https://github.com/google/tacotron

Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.

audio machine-learning prosody speech tacotron tts

Last synced: 08 Nov 2024

https://github.com/Audio-WestlakeU/FullSubNet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

audio band denoising full-band narrow-band noise-reduction paper pretrained-model pytorch reproducible-research single-channel speech speech-enhancement speech-processing speech-separation sub-band

Last synced: 22 Nov 2024

https://github.com/ddlbojack/speech-resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

speech speech-processing

Last synced: 21 Nov 2024

https://github.com/ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

speech speech-processing

Last synced: 02 Nov 2024

https://github.com/gotev/android-speech

Android speech recognition and text to speech made easy

android recognition speech tts

Last synced: 20 Dec 2024

https://github.com/modelscope/kan-tts

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

modelscope speech speech-synthesis tts

Last synced: 21 Dec 2024