Projects in Awesome Lists tagged with speech-processing

https://github.com/speechbrain/speechbrain

A PyTorch-based Speech Toolkit

asr audio audio-processing deep-learning huggingface language-model pytorch speaker-diarization speaker-recognition speaker-verification speech-enhancement speech-processing speech-recognition speech-separation speech-to-text speech-toolkit speechrecognition spoken-language-understanding transformers voice-recognition

Last synced: 13 May 2025

https://github.com/pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

overlapped-speech-detection pretrained-models pytorch speaker-change-detection speaker-diarization speaker-embedding speaker-recognition speaker-verification speech-activity-detection speech-processing voice-activity-detection

Last synced: 13 May 2025

https://github.com/snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

onnx onnx-runtime onnxruntime pytorch speech speech-processing vad voice-activity-detection voice-commands voice-control voice-detection voice-recognition

Last synced: 13 May 2025

https://github.com/microsoft/torchscale

Foundation Architecture for (M)LLMs

computer-vision machine-learning multimodal natural-language-processing pretrained-language-model speech-processing transformer translation

Last synced: 14 May 2025

https://github.com/linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

asr attention-is-all-you-need attention-mechanism attention-model attention-network attention-seq2seq attention-visualization deep-learning machine-learning multilingual-models python python3 pytorch speaker-diarization speech speech-processing speech-recognition speech-to-text transformers whisper

Last synced: 13 May 2025

https://github.com/r9y9/wavenet_vocoder

WaveNet vocoder

neural-vocoder python pytorch speech speech-processing speech-synthesis wavenet wavenet-vocoder

Last synced: 14 Apr 2025

https://github.com/r9y9/deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

end-to-end machine-learning multi-speaker python pytorch speech-processing speech-synthesis tts

Last synced: 14 May 2025

https://github.com/resemble-ai/resemble-enhance

AI powered speech denoising and enhancement

denoise speech-denoising speech-enhancement speech-processing

Last synced: 29 Apr 2025

https://github.com/digitalphonetics/ims-toucan

Controllable and fast Text-to-Speech for over 7000 languages!

deep-learning pytorch speech speech-processing speech-synthesis text-to-speech toolkit tts

Last synced: 26 Jun 2025

https://github.com/coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

speech-emotion-recognition speech-processing speech-recognition speech-separation speech-synthesis speech-to-text stt text-to-speech tts voice-activity-detection voice-cloning voice-recognition

Last synced: 27 Jan 2026

https://github.com/mravanelli/sincnet

SincNet is a neural architecture for efficiently processing raw audio samples.

artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform

Last synced: 16 May 2025

https://github.com/mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering neural-networks python pytorch signal-processing speaker-identification speaker-recognition speaker-verification speech-processing speech-recognition timit waveform

Last synced: 26 Apr 2025

https://github.com/haoheliu/voicefixer

General Speech Restoration

declipping denoise dereverberation mel speech speech-analysis speech-enhancement speech-processing speech-synthesis super-resolution tts vocoder

Last synced: 14 May 2025

https://github.com/midas-research/audino

Open source audio annotation tool for humans

annotation-tool audio-annotation audio-processing datasets machine-learning python speech-processing

Last synced: 13 Apr 2025

https://github.com/ictnlp/streamspeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

all-in-one asr audio-processing machine-translation non-autoregressive seamless simultaneous-translation speech speech-enhancement speech-processing speech-recognition speech-synthesis speech-to-text speech-translation streaming-audio text-to-audio text-to-speech translation tts voice

Last synced: 16 May 2025

https://github.com/x-lance/slam-llm

Speech, Language, Audio, Music Processing with Large Language Model

audio-processing large-language-model multimodal-large-language-models music-processing peft speech-processing

Last synced: 15 May 2025

https://github.com/Ryuk17/SpeechAlgorithms

You can find the speech algorithms you want here

speech-processing

Last synced: 29 Mar 2025

https://github.com/nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

asr audio detection filler recognition speech speech-processing speech-recognition timestamps transcription verbatim whisper

Last synced: 18 Jun 2026

https://github.com/nyrahealth/crisperwhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

asr audio detection filler recognition speech speech-processing speech-recognition timestamps transcription verbatim whisper

Last synced: 15 May 2025

https://github.com/drethage/speech-denoising-wavenet

A neural network for end-to-end speech denoising

deep-learning end-to-end machine-learning neural-networks speech speech-denoising speech-processing wavenet

Last synced: 14 Jul 2025

https://github.com/X-LANCE/SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

audio-processing large-language-model multimodal-large-language-models music-processing peft speech-processing

Last synced: 11 Sep 2025

https://github.com/huawei-noah/speech-backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

speech-processing speech-recognition speech-synthesis

Last synced: 04 Apr 2025

https://github.com/ddlbojack/speech-resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

speech speech-processing

Last synced: 28 Jan 2026

https://github.com/ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

speech speech-processing

Last synced: 01 Apr 2025

https://github.com/Audio-WestlakeU/FullSubNet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

audio band denoising full-band narrow-band noise-reduction paper pretrained-model pytorch reproducible-research single-channel speech speech-enhancement speech-processing speech-separation sub-band

Last synced: 14 Jul 2025

https://github.com/pliang279/MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

computer-vision deep-learning healthcare machine-learning multimodal-learning natural-language-processing representation-learning robotics speech-processing

Last synced: 08 May 2025

https://github.com/pliang279/multibench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

computer-vision deep-learning healthcare machine-learning multimodal-learning natural-language-processing representation-learning robotics speech-processing

Last synced: 02 Mar 2025

https://superkogito.github.io/spafe/

:sound: spafe: Simplified Python Audio Features Extraction

audio audio-analysis beat dsp features-extraction filterbank frequencies frequency frequency-analysis gammatone-filterbanks mfcc music music-information-retrieval pitch python signal-processing sound speech-processing time-frequency-analysis voice

Last synced: 24 May 2026

https://github.com/arjo129/uSpeech

Speech recognition toolkit for the arduino

arduino signal speech-processing speech-recognition

Last synced: 11 May 2025

https://github.com/arjo129/uspeech

Speech recognition toolkit for the arduino

arduino signal speech-processing speech-recognition

Last synced: 16 Jul 2025

https://github.com/superkogito/spafe

:sound: spafe: Simplified Python Audio Features Extraction

audio audio-analysis beat dsp features-extraction filterbank frequencies frequency frequency-analysis gammatone-filterbanks mfcc music music-information-retrieval pitch python signal-processing sound speech-processing time-frequency-analysis voice

Last synced: 14 May 2025

https://github.com/microsoft/unispeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

diarization pytorch speaker-verification speech speech-diarization speech-processing speech-recognition speech-separation

Last synced: 04 Apr 2025

https://github.com/gemengtju/Tutorial_Separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

deep-learning deep-neural-networks signal-processing speech-analysis speech-processing speech-separation

Last synced: 01 Apr 2025

https://github.com/r9y9/pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

digital-signal-processing dsp python python-wrapper speech speech-processing speech-synthesis sptk

Last synced: 16 May 2025

https://github.com/santi-pdp/pase

Problem Agnostic Speech Encoder

deep-learning multi-task-learning pytorch self-supervised-learning speech-processing unsupervised-learning waveform-analysis

Last synced: 24 Feb 2026

https://github.com/SuperKogito/spafe

:sound: spafe: Simplified Python Audio Features Extraction

audio audio-analysis beat dsp features-extraction filterbank frequencies frequency frequency-analysis gammatone-filterbanks mfcc music music-information-retrieval pitch python signal-processing sound speech-processing time-frequency-analysis voice

Last synced: 14 Jul 2025

https://github.com/novoic/surfboard

Novoic's audio feature extraction library

alzheimers-disease audio audio-processing feature-extraction healthcare machine-learning parkinsons-disease python signal-processing speech-processing

Last synced: 03 Apr 2025

https://github.com/r9y9/nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

machine-learning python speech-processing speech-synthesis text-to-speech voice-conversion

Last synced: 16 May 2025

https://github.com/speechbrain/speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

beamforming deep-learning deeplearning librispeech neural-network neural-networks speaker-identification speaker-recognition speaker-verification speech speech-analysis speech-api speech-emotion-recognition speech-processing speech-recognition speech-recognizer speech-separation speech-to-text speechrecognition timit

Last synced: 29 Jan 2026

https://github.com/nvidia/cleanunet

Official PyTorch Implementation of CleanUNet (ICASSP 2022)

noise-reduction speech-denoising speech-enchacement speech-processing

Last synced: 02 Sep 2025

https://github.com/seanwood/gcc-nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

cross-correlation dictionary-learning gcc gcc-nmf generalized-cross-correlation ipython-notebook low-latency machine-learning nmf real-time real-time-processing speaker speech speech-enhancement speech-processing speech-separation tdoa unsupervised-machine-learning

Last synced: 30 Aug 2025

https://github.com/haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement

A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch

cnn cnn-rnn pytorch real-time rnn speech-enhancement speech-processing

Last synced: 14 Jul 2025

https://github.com/Yuan-ManX/audio-development-tools

This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.

artificial-intelligence audio audio-generation audio-processing deep-learning dsp machine-learning music music-generation signal-processing speech speech-processing speech-synthesis

Last synced: 17 Mar 2025

https://github.com/gtreshchev/RuntimeSpeechRecognizer

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

audio-processing openai speech-detection speech-processing speech-recognition speech-to-text ue4 ue4-plugin ue5 ue5-plugin unreal-engine unreal-engine-4 unreal-engine-5 voice-recognition whis whisper whisper-ai whisper-cpp

Last synced: 08 Apr 2025

https://github.com/gtreshchev/runtimespeechrecognizer

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

audio-processing openai speech-detection speech-processing speech-recognition speech-to-text ue4 ue4-plugin ue5 ue5-plugin unreal-engine unreal-engine-4 unreal-engine-5 voice-recognition whis whisper whisper-ai whisper-cpp

Last synced: 23 Nov 2025

https://github.com/haoheliu/voicefixer_main

General Speech Restoration

machine-learning speech speech-analysis speech-enhancement speech-processing speech-synthesis speech-to-text tts

Last synced: 06 Apr 2025

https://github.com/r9y9/ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

attention-mechanism book deep-learning digital-signal-processing dnn neural-networks python python-tts seq2seq speech speech-processing speech-synthesis text-to-speech tts wavenet wavenet-vocoder

Last synced: 05 Apr 2025

https://github.com/AkojimaSLP/Beamforming-for-speech-enhancement

simple delaysum, MVDR and CGMM-MVDR

beamforming cgmm-mvdr delay-sum mvdr python signal-processing speech-enhancement speech-processing speech-recognition

Last synced: 01 Apr 2025

https://github.com/sp-nitech/sptk

A suite of speech signal processing tools

audio-processing cepstrum cpp dsp lpc lsp mfcc signal-processing speech speech-processing sptk unix-command

Last synced: 24 Dec 2025

https://github.com/jtkim-kaist/Speech-enhancement

Deep neural network based speech enhancement toolkit

speech-enhancement speech-processing

Last synced: 01 Apr 2025

https://github.com/tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

asr deep-learning gradio gradio-interface language-model pytorch speech-processing speech-recognition speech-to-text text-to-speech video-captioning voice-activity-detection

Last synced: 08 Jul 2025

https://github.com/innFactory/react-native-dialogflow

A React-Native Bridge for the Google Dialogflow (API.AI) SDK

api-ai apiai dialogflow google react-native speak speech speech-processing speech-to-function text-recognition voice

Last synced: 04 Aug 2025

https://github.com/innfactory/react-native-dialogflow

A React-Native Bridge for the Google Dialogflow (API.AI) SDK

api-ai apiai dialogflow google react-native speak speech speech-processing speech-to-function text-recognition voice

Last synced: 05 Apr 2025

https://github.com/attenlabs/saa-sdk

Addressee detection for voice agents: device-directed speech detection that runs before STT, so background speech, side conversations, and the agent's own TTS echo never trigger it. No wake word, model-agnostic, drop-in for LiveKit, Pipecat, ElevenLabs, Twilio, and OpenAI. The layer your VAD and turn detection are missing.

addressee-detection ai-agents barge-in conversational-ai device-directed-speech elevenlabs livekit openai pipecat real-time realtime speech-processing turn-detection twilio voice-activity-detection voice-agents voice-ai voice-assistant voice-sdk webrtc

Last synced: 05 Jul 2026

https://github.com/suyashmore/mevonai-speech-emotion-recognition

Identify the emotion of multiple speakers in an Audio Segment

artificial-intelligence colab-notebook convolutional-neural-networks deep-learning diarization emotion-analysis emotion-recognition keras-tensorflow machine-learning mfcc mfcc-analysis speech-processing uis-rnn

Last synced: 18 Oct 2025

https://github.com/ahkarami/great-deep-learning-books

A Great Collection of Deep Learning (e)Books

books convolutional-neural-networks deep-learning deep-neural-networks ebooks keras machine-learning mxnet natural-language-processing pytorch recurrent-neural-networks reinforcement-learning speech-processing tensorflow

Last synced: 04 Oct 2025

https://github.com/jefflai108/pytorch-kaldi-neural-speaker-embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.

kaldi learnable-dictionary-encoding pytorch speaker-identification speaker-recognition speaker-verification speech-processing

Last synced: 19 Jul 2025

https://github.com/NICEElevateAI/ElevateAIJavaSDK

Java SDK for ElevateAI

asr automated-speech-recognition free-for-dev free-for-developers java sdk sdk-java speech-processing speech-recognition speech-to-text

Last synced: 17 Jan 2026

https://github.com/albertaparicio/tfg-voice-conversion

Deep Learning-based Voice Conversion system

deep-learning deep-neural-networks gplv3 keras numpy python speaker speech speech-processing tensorflow voice-conversion

Last synced: 24 Oct 2025

https://github.com/tabahi/bournemouth-forced-aligner

Extract phoneme-level timestamps from speeh audio.

alignment forced-alignment phonemes speech speech-processing speech-recognition text-to-speech timestamps tts tts-dataset word

Last synced: 24 Feb 2026

https://github.com/haoheliu/torchsubband

Pytorch implementation of subband decomposition

deep-learning music-source-separation signal-processing speech-enhancement speech-processing speech-recognition

Last synced: 28 Dec 2025

https://github.com/r9y9/sptk

A modified version of Speech Signal Processing Toolkit (SPTK)

speech-processing

Last synced: 29 Jul 2025

https://github.com/huckiyang/quantumspeech-qcnn

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

colab-notebook ctc-model pennylane quantum-machine-learning speech-processing speech-recognition tensorflow2

Last synced: 15 Jun 2025

https://github.com/vocalpy/vak

A neural network framework for researchers studying acoustic communication

animal-communication animal-vocalizations bioacoustic-analysis bioacoustics birdsong python python3 pytorch spectrograms speech-processing torch torchvision vocalizations

Last synced: 04 Apr 2025

https://github.com/ga642381/SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

deep-learning large-language-models prompt speech-generation speech-llm speech-processing

Last synced: 21 Jul 2025

https://github.com/mwv/vad

Voice Activity Detector

python speech-processing

Last synced: 07 May 2025

https://github.com/huckiyang/voice2series-reprogramming

ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification

deep-learning machine-learning speech-processing time-series transfer-learning

Last synced: 08 Oct 2025

https://github.com/grausof/keras-sincnet

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

artificial-intelligence asr audio audio-processing cnn convolutional-neural-networks deep-learning digital-signal-processing filtering keras machine-learning neural-network speaker-recognition speaker-verification speech-processing speech-recognition tensorflow timit waveform

Last synced: 22 Jul 2025

https://github.com/inevolin/discordearsbot

A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.

discord discord-bot discord-js hearing-aids hearing-impaired speech speech-processing speech-recognition speech-synthesis speech-to-text stt

Last synced: 05 Apr 2025

https://github.com/SIP-Lab/CNN-VAD

A Convolutional Neural Network based Voice Activity Detector for Smartphones

deep-learning deep-neural-networks digital-signal-processing smartphone speech-processing

Last synced: 07 May 2025

https://github.com/bunyaminergen/callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

denoising diarization forced-alignment llama3 llm openai opensource sentiment-analysis speech-emotion-recognition speech-processing speech-recognition speech-to-text summary topic-modeling transcription voice-activity-detection voice-recognition

Last synced: 03 Apr 2025

https://github.com/markparker5/stark

S.T.A.R.K. - Speech And Text Algorithmic Recognition Kit

cross-platform framework natural-language natural-language-processing natural-language-understanding python python3 speech-processing speech-recognition voice voice-assistant voice-commands voice-control voice-interface voice-recognition

Last synced: 28 Apr 2025

https://github.com/clement-pages/gryannote

Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.

annotation-processing annotation-tool audio gradio gradio-custom-component interspeech2024 pyannote speaker-diarization speech-processing

Last synced: 05 Apr 2025

https://github.com/fulldecent/formant-analyzer

iOS application for finding formants in spoken sounds

app application ios language language-learning mature speech-processing speech-recognition speech-therapy swift

Last synced: 09 Apr 2025

https://github.com/wq2012/simpleder

A lightweight library to compute Diarization Error Rate (DER).

diarization machine-learning metrics speaker-diarization speech-processing speech-recognition

Last synced: 30 Aug 2025

https://github.com/declare-lab/speech-adapters

Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech understanding

adapter asr speech-processing speech-recognition speech-synthesis speech-to-text tts

Last synced: 14 Apr 2025

https://github.com/jcvasquezc/phonet

Keras-based python framework to compute phonological posterior probabilities from audio files

deep-learning deep-neural-networks linguistic-analysis linguistics phonetics speech-processing

Last synced: 14 Jan 2026

https://github.com/spokestack/spokestack-ios

Spokestack: give your iOS app a voice interface!

asr hacktoberfest ios natural-language-understanding speech-api speech-processing speech-recognition speech-synthesis speech-to-text swift tensorflow text-to-speech vad voice-activity-detection voice-assistant voice-recognition voice-synthesis wakeword wakeword-activation

Last synced: 04 Oct 2025

https://github.com/vectominist/spin

Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"

clustering disentanglement self-supervised-learning speech-processing speech-recognition

Last synced: 26 Jul 2025

https://github.com/montrealcorpustools/polyglotdb

Language data store and linguistic query API

acoustics database influxdb neo4j rest-api speech-analysis speech-processing

Last synced: 06 Apr 2025

https://github.com/ardauzunoglu/rte-speech-generator

Natural Language Processing to generate new speeches for the President of Turkey.

natural-language-processing nlp politics python speech-processing tensorflow turkce turkish turkish-nlp

Last synced: 03 May 2025

https://github.com/k2kobayashi/shifter

Pitch shifter using WSOLA and resampling implemented by Python3

signal-processing speech speech-processing voice-control voice-conversion

Last synced: 28 Jul 2025

https://github.com/aydinnyunus/LinuxVoiceAssistant

Linux Voice Assistant for to Make Your Work Easier

assistant assistant-chat-bots google google-assistant google-assistant-apps google-assistant-desktop python python3 speech-processing speech-recognition speech-to-text tkinter tkinter-graphic-interface tkinter-gui tkinter-python voice voice-assistant voice-commands voice-control voice-conversion

Last synced: 12 Apr 2025

https://github.com/navalnica/be_nlp_speech_resources

Links to Belarusian NLP and Speech resources

asr belarus belarusian belarusian-language natural-language-processing nlp speech speech-processing speech-recognition speech-synthesis speech-to-text stt text-to-speech tts

Last synced: 05 Mar 2026

https://github.com/aydinnyunus/linuxvoiceassistant

Linux Voice Assistant for to Make Your Work Easier

assistant assistant-chat-bots google google-assistant google-assistant-apps google-assistant-desktop python python3 speech-processing speech-recognition speech-to-text tkinter tkinter-graphic-interface tkinter-gui tkinter-python voice voice-assistant voice-commands voice-control voice-conversion

Last synced: 29 Apr 2025

https://github.com/tabahi/webspeechanalyzer

JS speech analyzer for fast speech analysis and labeling

audio-analysis audio-processing feature feature-engineering feature-extraction formant-detection music music-information-retrieval music-visualizer phonemes signal-processing spectrum spectrum-analyzer speech speech-analysis speech-processing speech-recognition

Last synced: 11 Mar 2026

https://github.com/mycrazycracy/tf-kaldi-speaker

Neural speaker recognition/verification system based on Kaldi and Tensorflow

kaldi kaldi-asr machine-learning neural-network speaker-identification speaker-recognition speaker-verification speech-processing tensorflow

Last synced: 03 May 2025

https://github.com/ryota-komatsu/speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

self-supervised-learning speech speech-processing

Last synced: 22 Sep 2025

https://github.com/bhattbhavesh91/wav2vec2-huggingface-demo

Speech to Text with self-supervised learning based on wav2vec 2.0 framework using Hugging Face's Transformer

facebook-wav2vec self-supervised-learning speech speech-processing speech-recognition speech-to-text unsupervised-learning wav2vec

Last synced: 20 Jun 2025

https://github.com/r9y9/world.jl

A lightweight julia wrapper for WORLD - a high-quality speech analysis, modification and synthesis system

julia julia-wrapper speech-processing

Last synced: 01 Mar 2026

https://github.com/liamdugan/speech-to-speech

Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"

simultaneous-translation speech speech-processing speech-to-speech speech-translation

Last synced: 20 Mar 2025

https://github.com/tabahi/formantanalyzer.js

Extract formant features such as frequency, power, energy, and bandwidth of formants at syllable or word level from audio sources in a web browser using WebAudio API.

audio-analysis audio-processing feature feature-engineering feature-extraction formant formant-detection music music-visualizer signal-processing spectrum-analyzer speech-processing

Last synced: 17 Oct 2025

https://github.com/emergenceai/kotlin_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies for Android and iOS.

android feature-extraction ios kotlin speech-feature-extraction speech-features speech-processing

Last synced: 11 Dec 2025

https://github.com/farzadforuozanfar/speech-recognition

I recorded 10 voices with the same words from myself and compared them with another 10 words from another person. I was able to find a threshold level that acknowledges and recognizes my own voice.

distance dtw dtw-algorithm jupyter-notebook python3 speech-processing speech-recognition speech-to-text

Last synced: 12 Apr 2025

https://github.com/gogyzzz/beamformit_matlab

A MATLAB implementation of CHiME4 baseline Beamformit

beamforming beamformit beamformit-step matlab speech-enhancement speech-processing speech-recognition

Last synced: 01 Apr 2025

https://github.com/ringabout/scim

[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.

arraymancer audio digital-signal-processing mfcc nim scientific-computing speech-analysis speech-processing speech-recognition wav

Last synced: 18 Mar 2025

https://github.com/mahtafetrat/manatts-persian-speech-dataset

ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset