Projects in Awesome Lists tagged with speech-processing
A curated list of projects in awesome lists tagged with speech-processing .
https://github.com/speechbrain/speechbrain
A PyTorch-based Speech Toolkit
asr audio audio-processing deep-learning huggingface language-model pytorch speaker-diarization speaker-recognition speaker-verification speech-enhancement speech-processing speech-recognition speech-separation speech-to-text speech-toolkit speechrecognition spoken-language-understanding transformers voice-recognition
Last synced: 13 May 2025
https://github.com/pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
overlapped-speech-detection pretrained-models pytorch speaker-change-detection speaker-diarization speaker-embedding speaker-recognition speaker-verification speech-activity-detection speech-processing voice-activity-detection
Last synced: 13 May 2025
https://github.com/snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
onnx onnx-runtime onnxruntime pytorch speech speech-processing vad voice-activity-detection voice-commands voice-control voice-detection voice-recognition
Last synced: 13 May 2025
https://github.com/microsoft/torchscale
Foundation Architecture for (M)LLMs
computer-vision machine-learning multimodal natural-language-processing pretrained-language-model speech-processing transformer translation
Last synced: 14 May 2025
https://github.com/linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
asr attention-is-all-you-need attention-mechanism attention-model attention-network attention-seq2seq attention-visualization deep-learning machine-learning multilingual-models python python3 pytorch speaker-diarization speech speech-processing speech-recognition speech-to-text transformers whisper
Last synced: 13 May 2025
https://github.com/r9y9/wavenet_vocoder
WaveNet vocoder
neural-vocoder python pytorch speech speech-processing speech-synthesis wavenet wavenet-vocoder
Last synced: 14 Apr 2025
https://github.com/r9y9/deepvoice3_pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
end-to-end machine-learning multi-speaker python pytorch speech-processing speech-synthesis tts
Last synced: 14 May 2025
https://github.com/resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
denoise speech-denoising speech-enhancement speech-processing
Last synced: 29 Apr 2025
https://github.com/digitalphonetics/ims-toucan
Controllable and fast Text-to-Speech for over 7000 languages!
deep-learning pytorch speech speech-processing speech-synthesis text-to-speech toolkit tts
Last synced: 26 Jun 2025
https://github.com/coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
speech-emotion-recognition speech-processing speech-recognition speech-separation speech-synthesis speech-to-text stt text-to-speech tts voice-activity-detection voice-cloning voice-recognition
Last synced: 27 Jan 2026
https://github.com/mravanelli/sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform
Last synced: 16 May 2025
https://github.com/mravanelli/SincNet
SincNet is a neural architecture for efficiently processing raw audio samples.
artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform
Last synced: 26 Apr 2025
https://github.com/haoheliu/voicefixer
General Speech Restoration
declipping denoise dereverberation mel speech speech-analysis speech-enhancement speech-processing speech-synthesis super-resolution tts vocoder
Last synced: 14 May 2025
https://github.com/midas-research/audino
Open source audio annotation tool for humans
annotation-tool audio-annotation audio-processing datasets machine-learning python speech-processing
Last synced: 13 Apr 2025
https://github.com/ictnlp/streamspeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
all-in-one asr audio-processing machine-translation non-autoregressive seamless simultaneous-translation speech speech-enhancement speech-processing speech-recognition speech-synthesis speech-to-text speech-translation streaming-audio text-to-audio text-to-speech translation tts voice
Last synced: 16 May 2025
https://github.com/x-lance/slam-llm
Speech, Language, Audio, Music Processing with Large Language Model
audio-processing large-language-model multimodal-large-language-models music-processing peft speech-processing
Last synced: 15 May 2025
https://github.com/Ryuk17/SpeechAlgorithms
You can find the speech algorithms you want here
Last synced: 29 Mar 2025
https://github.com/nyrahealth/crisperwhisper
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
asr audio detection filler recognition speech speech-processing speech-recognition timestamps transcription verbatim whisper
Last synced: 15 May 2025
https://github.com/drethage/speech-denoising-wavenet
A neural network for end-to-end speech denoising
deep-learning end-to-end machine-learning neural-networks speech speech-denoising speech-processing wavenet
Last synced: 14 Jul 2025
https://github.com/X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
audio-processing large-language-model multimodal-large-language-models music-processing peft speech-processing
Last synced: 11 Sep 2025
https://github.com/huawei-noah/speech-backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
speech-processing speech-recognition speech-synthesis
Last synced: 04 Apr 2025
https://github.com/Audio-WestlakeU/FullSubNet
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
audio band denoising full-band narrow-band noise-reduction paper pretrained-model pytorch reproducible-research single-channel speech speech-enhancement speech-processing speech-separation sub-band
Last synced: 14 Jul 2025
https://github.com/pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
computer-vision deep-learning healthcare machine-learning multimodal-learning natural-language-processing representation-learning robotics speech-processing
Last synced: 08 May 2025
https://github.com/pliang279/multibench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
computer-vision deep-learning healthcare machine-learning multimodal-learning natural-language-processing representation-learning robotics speech-processing
Last synced: 02 Mar 2025
https://github.com/arjo129/uSpeech
Speech recognition toolkit for the arduino
arduino signal speech-processing speech-recognition
Last synced: 11 May 2025
https://github.com/arjo129/uspeech
Speech recognition toolkit for the arduino
arduino signal speech-processing speech-recognition
Last synced: 16 Jul 2025
https://github.com/superkogito/spafe
:sound: spafe: Simplified Python Audio Features Extraction
audio audio-analysis beat dsp features-extraction filterbank frequencies frequency frequency-analysis gammatone-filterbanks mfcc music music-information-retrieval pitch python signal-processing sound speech-processing time-frequency-analysis voice
Last synced: 14 May 2025
https://github.com/microsoft/unispeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
diarization pytorch speaker-verification speech speech-diarization speech-processing speech-recognition speech-separation
Last synced: 04 Apr 2025
https://github.com/gemengtju/Tutorial_Separation
This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
deep-learning deep-neural-networks signal-processing speech-analysis speech-processing speech-separation
Last synced: 01 Apr 2025
https://github.com/r9y9/pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
digital-signal-processing dsp python python-wrapper speech speech-processing speech-synthesis sptk
Last synced: 16 May 2025
https://github.com/santi-pdp/pase
Problem Agnostic Speech Encoder
deep-learning multi-task-learning pytorch self-supervised-learning speech-processing unsupervised-learning waveform-analysis
Last synced: 24 Feb 2026
https://github.com/SuperKogito/spafe
:sound: spafe: Simplified Python Audio Features Extraction
audio audio-analysis beat dsp features-extraction filterbank frequencies frequency frequency-analysis gammatone-filterbanks mfcc music music-information-retrieval pitch python signal-processing sound speech-processing time-frequency-analysis voice
Last synced: 14 Jul 2025
https://github.com/novoic/surfboard
Novoic's audio feature extraction library
alzheimers-disease audio audio-processing feature-extraction healthcare machine-learning parkinsons-disease python signal-processing speech-processing
Last synced: 03 Apr 2025
https://github.com/r9y9/nnmnkwii
Library to build speech synthesis systems designed for easy and fast prototyping.
machine-learning python speech-processing speech-synthesis text-to-speech voice-conversion
Last synced: 16 May 2025
https://github.com/speechbrain/speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
beamforming deep-learning deeplearning librispeech neural-network neural-networks speaker-identification speaker-recognition speaker-verification speech speech-analysis speech-api speech-emotion-recognition speech-processing speech-recognition speech-recognizer speech-separation speech-to-text speechrecognition timit
Last synced: 29 Jan 2026
https://github.com/nvidia/cleanunet
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
noise-reduction speech-denoising speech-enchacement speech-processing
Last synced: 02 Sep 2025
https://github.com/seanwood/gcc-nmf
Real-time GCC-NMF Blind Speech Separation and Enhancement
cross-correlation dictionary-learning gcc gcc-nmf generalized-cross-correlation ipython-notebook low-latency machine-learning nmf real-time real-time-processing speaker speech speech-enhancement speech-processing speech-separation tdoa unsupervised-machine-learning
Last synced: 30 Aug 2025
https://github.com/haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
cnn cnn-rnn pytorch real-time rnn speech-enhancement speech-processing
Last synced: 14 Jul 2025
https://github.com/Yuan-ManX/audio-development-tools
This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.
artificial-intelligence audio audio-generation audio-processing deep-learning dsp machine-learning music music-generation signal-processing speech speech-processing speech-synthesis
Last synced: 17 Mar 2025
https://github.com/gtreshchev/RuntimeSpeechRecognizer
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
audio-processing openai speech-detection speech-processing speech-recognition speech-to-text ue4 ue4-plugin ue5 ue5-plugin unreal-engine unreal-engine-4 unreal-engine-5 voice-recognition whis whisper whisper-ai whisper-cpp
Last synced: 08 Apr 2025
https://github.com/gtreshchev/runtimespeechrecognizer
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
audio-processing openai speech-detection speech-processing speech-recognition speech-to-text ue4 ue4-plugin ue5 ue5-plugin unreal-engine unreal-engine-4 unreal-engine-5 voice-recognition whis whisper whisper-ai whisper-cpp
Last synced: 23 Nov 2025
https://github.com/haoheliu/voicefixer_main
General Speech Restoration
machine-learning speech speech-analysis speech-enhancement speech-processing speech-synthesis speech-to-text tts
Last synced: 06 Apr 2025
https://github.com/r9y9/ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
attention-mechanism book deep-learning digital-signal-processing dnn neural-networks python python-tts seq2seq speech speech-processing speech-synthesis text-to-speech tts wavenet wavenet-vocoder
Last synced: 05 Apr 2025
https://github.com/AkojimaSLP/Beamforming-for-speech-enhancement
simple delaysum, MVDR and CGMM-MVDR
beamforming cgmm-mvdr delay-sum mvdr python signal-processing speech-enhancement speech-processing speech-recognition
Last synced: 01 Apr 2025
https://github.com/sp-nitech/sptk
A suite of speech signal processing tools
audio-processing cepstrum cpp dsp lpc lsp mfcc signal-processing speech speech-processing sptk unix-command
Last synced: 24 Dec 2025
https://github.com/jtkim-kaist/Speech-enhancement
Deep neural network based speech enhancement toolkit
speech-enhancement speech-processing
Last synced: 01 Apr 2025
https://github.com/tomchang25/whisper-auto-transcribe
Auto transcribe tool based on whisper
asr deep-learning gradio gradio-interface language-model pytorch speech-processing speech-recognition speech-to-text text-to-speech video-captioning voice-activity-detection
Last synced: 08 Jul 2025
https://github.com/innFactory/react-native-dialogflow
A React-Native Bridge for the Google Dialogflow (API.AI) SDK
api-ai apiai dialogflow google react-native speak speech speech-processing speech-to-function text-recognition voice
Last synced: 04 Aug 2025
https://github.com/innfactory/react-native-dialogflow
A React-Native Bridge for the Google Dialogflow (API.AI) SDK
api-ai apiai dialogflow google react-native speak speech speech-processing speech-to-function text-recognition voice
Last synced: 05 Apr 2025
https://github.com/suyashmore/mevonai-speech-emotion-recognition
Identify the emotion of multiple speakers in an Audio Segment
artificial-intelligence colab-notebook convolutional-neural-networks deep-learning diarization emotion-analysis emotion-recognition keras-tensorflow machine-learning mfcc mfcc-analysis speech-processing uis-rnn
Last synced: 18 Oct 2025
https://github.com/ahkarami/great-deep-learning-books
A Great Collection of Deep Learning (e)Books
books convolutional-neural-networks deep-learning deep-neural-networks ebooks keras machine-learning mxnet natural-language-processing pytorch recurrent-neural-networks reinforcement-learning speech-processing tensorflow
Last synced: 04 Oct 2025
https://github.com/jefflai108/pytorch-kaldi-neural-speaker-embeddings
A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
kaldi learnable-dictionary-encoding pytorch speaker-identification speaker-recognition speaker-verification speech-processing
Last synced: 19 Jul 2025
https://github.com/NICEElevateAI/ElevateAIJavaSDK
Java SDK for ElevateAI
asr automated-speech-recognition free-for-dev free-for-developers java sdk sdk-java speech-processing speech-recognition speech-to-text
Last synced: 17 Jan 2026
https://github.com/albertaparicio/tfg-voice-conversion
Deep Learning-based Voice Conversion system
deep-learning deep-neural-networks gplv3 keras numpy python speaker speech speech-processing tensorflow voice-conversion
Last synced: 24 Oct 2025
https://github.com/tabahi/bournemouth-forced-aligner
Extract phoneme-level timestamps from speeh audio.
alignment forced-alignment phonemes speech speech-processing speech-recognition text-to-speech timestamps tts tts-dataset word
Last synced: 24 Feb 2026
https://github.com/haoheliu/torchsubband
Pytorch implementation of subband decomposition
deep-learning music-source-separation signal-processing speech-enhancement speech-processing speech-recognition
Last synced: 28 Dec 2025
https://github.com/r9y9/sptk
A modified version of Speech Signal Processing Toolkit (SPTK)
Last synced: 29 Jul 2025
https://github.com/huckiyang/quantumspeech-qcnn
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
colab-notebook ctc-model pennylane quantum-machine-learning speech-processing speech-recognition tensorflow2
Last synced: 15 Jun 2025
https://github.com/vocalpy/vak
A neural network framework for researchers studying acoustic communication
animal-communication animal-vocalizations bioacoustic-analysis bioacoustics birdsong python python3 pytorch spectrograms speech-processing torch torchvision vocalizations
Last synced: 04 Apr 2025
https://github.com/ga642381/SpeechGen
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
deep-learning large-language-models prompt speech-generation speech-llm speech-processing
Last synced: 21 Jul 2025
https://github.com/huckiyang/voice2series-reprogramming
ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification
deep-learning machine-learning speech-processing time-series transfer-learning
Last synced: 08 Oct 2025
https://github.com/grausof/keras-sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering keras machine-learning neural-network speaker-recognition speaker-verification speech-processing speech-recognition tensorflow timit waveform
Last synced: 22 Jul 2025
https://github.com/inevolin/discordearsbot
A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.
discord discord-bot discord-js hearing-aids hearing-impaired speech speech-processing speech-recognition speech-synthesis speech-to-text stt
Last synced: 05 Apr 2025
https://github.com/SIP-Lab/CNN-VAD
A Convolutional Neural Network based Voice Activity Detector for Smartphones
deep-learning deep-neural-networks digital-signal-processing smartphone speech-processing
Last synced: 07 May 2025
https://github.com/bunyaminergen/callytics
Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.
denoising diarization forced-alignment llama3 llm openai opensource sentiment-analysis speech-emotion-recognition speech-processing speech-recognition speech-to-text summary topic-modeling transcription voice-activity-detection voice-recognition
Last synced: 03 Apr 2025
https://github.com/markparker5/stark
S.T.A.R.K. - Speech And Text Algorithmic Recognition Kit
cross-platform framework natural-language natural-language-processing natural-language-understanding python python3 speech-processing speech-recognition voice voice-assistant voice-commands voice-control voice-interface voice-recognition
Last synced: 28 Apr 2025
https://github.com/clement-pages/gryannote
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
annotation-processing annotation-tool audio gradio gradio-custom-component interspeech2024 pyannote speaker-diarization speech-processing
Last synced: 05 Apr 2025
https://github.com/wq2012/simpleder
A lightweight library to compute Diarization Error Rate (DER).
diarization machine-learning metrics speaker-diarization speech-processing speech-recognition
Last synced: 30 Aug 2025
https://github.com/fulldecent/formant-analyzer
iOS application for finding formants in spoken sounds
app application ios language language-learning mature speech-processing speech-recognition speech-therapy swift
Last synced: 09 Apr 2025
https://github.com/declare-lab/speech-adapters
Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech understanding
adapter asr speech-processing speech-recognition speech-synthesis speech-to-text tts
Last synced: 14 Apr 2025
https://github.com/jcvasquezc/phonet
Keras-based python framework to compute phonological posterior probabilities from audio files
deep-learning deep-neural-networks linguistic-analysis linguistics phonetics speech-processing
Last synced: 14 Jan 2026
https://github.com/spokestack/spokestack-ios
Spokestack: give your iOS app a voice interface!
asr hacktoberfest ios natural-language-understanding speech-api speech-processing speech-recognition speech-synthesis speech-to-text swift tensorflow text-to-speech vad voice-activity-detection voice-assistant voice-recognition voice-synthesis wakeword wakeword-activation
Last synced: 04 Oct 2025
https://github.com/vectominist/spin
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
clustering disentanglement self-supervised-learning speech-processing speech-recognition
Last synced: 26 Jul 2025
https://github.com/montrealcorpustools/polyglotdb
Language data store and linguistic query API
acoustics database influxdb neo4j rest-api speech-analysis speech-processing
Last synced: 06 Apr 2025
https://github.com/ardauzunoglu/rte-speech-generator
Natural Language Processing to generate new speeches for the President of Turkey.
natural-language-processing nlp politics python speech-processing tensorflow turkce turkish turkish-nlp
Last synced: 03 May 2025
https://github.com/k2kobayashi/shifter
Pitch shifter using WSOLA and resampling implemented by Python3
signal-processing speech speech-processing voice-control voice-conversion
Last synced: 28 Jul 2025
https://github.com/aydinnyunus/linuxvoiceassistant
Linux Voice Assistant for to Make Your Work Easier
assistant assistant-chat-bots google google-assistant google-assistant-apps google-assistant-desktop python python3 speech-processing speech-recognition speech-to-text tkinter tkinter-graphic-interface tkinter-gui tkinter-python voice voice-assistant voice-commands voice-control voice-conversion
Last synced: 29 Apr 2025
https://github.com/aydinnyunus/LinuxVoiceAssistant
Linux Voice Assistant for to Make Your Work Easier
assistant assistant-chat-bots google google-assistant google-assistant-apps google-assistant-desktop python python3 speech-processing speech-recognition speech-to-text tkinter tkinter-graphic-interface tkinter-gui tkinter-python voice voice-assistant voice-commands voice-control voice-conversion
Last synced: 12 Apr 2025
https://github.com/navalnica/be_nlp_speech_resources
Links to Belarusian NLP and Speech resources
asr belarus belarusian belarusian-language natural-language-processing nlp speech speech-processing speech-recognition speech-synthesis speech-to-text stt text-to-speech tts
Last synced: 05 Mar 2026
https://github.com/tabahi/webspeechanalyzer
JS speech analyzer for fast speech analysis and labeling
audio-analysis audio-processing feature feature-engineering feature-extraction formant-detection music music-information-retrieval music-visualizer phonemes signal-processing spectrum spectrum-analyzer speech speech-analysis speech-processing speech-recognition
Last synced: 11 Mar 2026
https://github.com/mycrazycracy/tf-kaldi-speaker
Neural speaker recognition/verification system based on Kaldi and Tensorflow
kaldi kaldi-asr machine-learning neural-network speaker-identification speaker-recognition speaker-verification speech-processing tensorflow
Last synced: 03 May 2025
https://github.com/ryota-komatsu/speaker_disentangled_hubert
Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
self-supervised-learning speech speech-processing
Last synced: 22 Sep 2025
https://github.com/liamdugan/speech-to-speech
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
simultaneous-translation speech speech-processing speech-to-speech speech-translation
Last synced: 20 Mar 2025
https://github.com/bhattbhavesh91/wav2vec2-huggingface-demo
Speech to Text with self-supervised learning based on wav2vec 2.0 framework using Hugging Face's Transformer
facebook-wav2vec self-supervised-learning speech speech-processing speech-recognition speech-to-text unsupervised-learning wav2vec
Last synced: 20 Jun 2025
https://github.com/r9y9/world.jl
A lightweight julia wrapper for WORLD - a high-quality speech analysis, modification and synthesis system
julia julia-wrapper speech-processing
Last synced: 01 Mar 2026
https://github.com/tabahi/formantanalyzer.js
Extract formant features such as frequency, power, energy, and bandwidth of formants at syllable or word level from audio sources in a web browser using WebAudio API.
audio-analysis audio-processing feature feature-engineering feature-extraction formant formant-detection music music-visualizer signal-processing spectrum-analyzer speech-processing
Last synced: 17 Oct 2025
https://github.com/emergenceai/kotlin_speech_features
This library provides common speech features for ASR including MFCCs and filterbank energies for Android and iOS.
android feature-extraction ios kotlin speech-feature-extraction speech-features speech-processing
Last synced: 11 Dec 2025
https://github.com/farzadforuozanfar/speech-recognition
I recorded 10 voices with the same words from myself and compared them with another 10 words from another person. I was able to find a threshold level that acknowledges and recognizes my own voice.
distance dtw dtw-algorithm jupyter-notebook python3 speech-processing speech-recognition speech-to-text
Last synced: 12 Apr 2025
https://github.com/gogyzzz/beamformit_matlab
A MATLAB implementation of CHiME4 baseline Beamformit
beamforming beamformit beamformit-step matlab speech-enhancement speech-processing speech-recognition
Last synced: 01 Apr 2025
https://github.com/ringabout/scim
[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.
arraymancer audio digital-signal-processing mfcc nim scientific-computing speech-analysis speech-processing speech-recognition wav
Last synced: 18 Mar 2025
https://github.com/mahtafetrat/manatts-persian-speech-dataset
ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 08 Apr 2025
https://github.com/trungnt13/odin-ai
Orgainzed Digital Intelligent Network (O.D.I.N)
bayesian-methods deep-learning deep-neural-networks disentangled-representations disentanglement-learning factor-vae generative-model graph-algorithms image-processing machine-learning natural-language-processing probabilistic-graphical-models probabilistic-programming semi-supervised-learning speech-processing text-processing variational-autoencoder variational-autoencoders
Last synced: 14 Mar 2026
https://github.com/r9y9/melgeneralizedcepstrums.jl
Mel-Generalized Cepstrum analysis
Last synced: 07 Oct 2025
https://github.com/9jaswag/speechrec
a simple speech recognition app using the Web Speech API Interfaces
speech-api speech-processing speech-recognition speech-synthesis speech-to-text
Last synced: 21 Jul 2025
https://github.com/shunsukeaihara/pyssp
python speech signal processing library
python2 python3 signal-processing speech-processing
Last synced: 12 Apr 2025
https://github.com/jxlarrea/wyoming-voice-match
A Wyoming protocol ASR proxy that verifies speaker identity and isolates voice commands from background noise before forwarding audio to a downstream speech-to-text service. Designed for Home Assistant voice pipelines to prevent false activations from TVs, radios, and other people - and to deliver clean transcripts even in noisy environments.
asr asr-services docker embeddings home-assistant proxy-server speech-processing speech-to-text voice-activity-detection voice-assistant wyoming-protocol
Last synced: 03 Mar 2026
https://github.com/MahtaFetrat/ManaTTS-Persian-Speech-Dataset
ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 01 Mar 2025