Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with speech-processing

A curated list of projects in awesome lists tagged with speech-processing .

https://github.com/pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

overlapped-speech-detection pretrained-models pytorch speaker-change-detection speaker-diarization speaker-embedding speaker-recognition speaker-verification speech-activity-detection speech-processing voice-activity-detection

Last synced: 14 Jan 2025

https://github.com/r9y9/deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

end-to-end machine-learning multi-speaker python pytorch speech-processing speech-synthesis tts

Last synced: 17 Jan 2025

https://github.com/resemble-ai/resemble-enhance

AI powered speech denoising and enhancement

denoise speech-denoising speech-enhancement speech-processing

Last synced: 15 Jan 2025

https://github.com/Ryuk17/SpeechAlgorithms

You can find the speech algorithms you want here

speech-processing

Last synced: 01 Nov 2024

https://github.com/x-lance/slam-llm

Speech, Language, Audio, Music Processing with Large Language Model

audio-processing large-language-model multimodal-large-language-models music-processing peft speech-processing

Last synced: 18 Jan 2025

https://github.com/X-LANCE/SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

audio-processing large-language-model multimodal-large-language-models music-processing peft speech-processing

Last synced: 06 Jan 2025

https://github.com/huawei-noah/speech-backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

speech-processing speech-recognition speech-synthesis

Last synced: 18 Jan 2025

https://github.com/Audio-WestlakeU/FullSubNet

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

audio band denoising full-band narrow-band noise-reduction paper pretrained-model pytorch reproducible-research single-channel speech speech-enhancement speech-processing speech-separation sub-band

Last synced: 22 Nov 2024

https://github.com/nyrahealth/crisperwhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

asr audio detection filler recognition speech speech-processing speech-recognition timestamps transcription verbatim whisper

Last synced: 17 Jan 2025

https://github.com/ddlbojack/speech-resources

语音方向实验室/公司/资源/实习等,欢迎推荐或自荐

speech speech-processing

Last synced: 21 Nov 2024

https://github.com/ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等,欢迎推荐或自荐

speech speech-processing

Last synced: 02 Nov 2024

https://github.com/arjo129/uspeech

Speech recognition toolkit for the arduino

arduino signal speech-processing speech-recognition

Last synced: 24 Nov 2024

https://github.com/arjo129/uSpeech

Speech recognition toolkit for the arduino

arduino signal speech-processing speech-recognition

Last synced: 17 Nov 2024

https://github.com/r9y9/pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

digital-signal-processing dsp python python-wrapper speech speech-processing speech-synthesis sptk

Last synced: 20 Jan 2025

https://github.com/gemengtju/Tutorial_Separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

deep-learning deep-neural-networks signal-processing speech-analysis speech-processing speech-separation

Last synced: 02 Nov 2024

https://github.com/r9y9/nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

machine-learning python speech-processing speech-synthesis text-to-speech voice-conversion

Last synced: 20 Jan 2025

https://github.com/speechbrain/speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

beamforming deep-learning deeplearning librispeech neural-network neural-networks speaker-identification speaker-recognition speaker-verification speech speech-analysis speech-api speech-emotion-recognition speech-processing speech-recognition speech-recognizer speech-separation speech-to-text speechrecognition timit

Last synced: 13 Nov 2024

https://github.com/haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement

A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch

cnn cnn-rnn pytorch real-time rnn speech-enhancement speech-processing

Last synced: 22 Nov 2024

https://github.com/nvidia/cleanunet

Official PyTorch Implementation of CleanUNet (ICASSP 2022)

noise-reduction speech-denoising speech-enchacement speech-processing

Last synced: 14 Jan 2025

https://github.com/Yuan-ManX/audio-development-tools

This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.

artificial-intelligence audio audio-generation audio-processing deep-learning dsp machine-learning music music-generation signal-processing speech speech-processing speech-synthesis

Last synced: 27 Oct 2024

https://github.com/gtreshchev/runtimespeechrecognizer

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

audio-processing openai speech-detection speech-processing speech-recognition speech-to-text ue4 ue4-plugin ue5 ue5-plugin unreal-engine unreal-engine-4 unreal-engine-5 voice-recognition whis whisper whisper-ai whisper-cpp

Last synced: 15 Jan 2025

https://github.com/gtreshchev/RuntimeSpeechRecognizer

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

audio-processing openai speech-detection speech-processing speech-recognition speech-to-text ue4 ue4-plugin ue5 ue5-plugin unreal-engine unreal-engine-4 unreal-engine-5 voice-recognition whis whisper whisper-ai whisper-cpp

Last synced: 06 Nov 2024

https://github.com/jtkim-kaist/Speech-enhancement

Deep neural network based speech enhancement toolkit

speech-enhancement speech-processing

Last synced: 02 Nov 2024

https://github.com/r9y9/sptk

A modified version of Speech Signal Processing Toolkit (SPTK)

speech-processing

Last synced: 03 Dec 2024

https://github.com/ga642381/SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

deep-learning large-language-models prompt speech-generation speech-llm speech-processing

Last synced: 28 Nov 2024

https://github.com/mwv/vad

Voice Activity Detector

python speech-processing

Last synced: 14 Nov 2024

https://github.com/SIP-Lab/CNN-VAD

A Convolutional Neural Network based Voice Activity Detector for Smartphones

deep-learning deep-neural-networks digital-signal-processing smartphone speech-processing

Last synced: 14 Nov 2024

https://github.com/inevolin/discordearsbot

A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.

discord discord-bot discord-js hearing-aids hearing-impaired speech speech-processing speech-recognition speech-synthesis speech-to-text stt

Last synced: 13 Jan 2025

https://github.com/wq2012/simpleder

A lightweight library to compute Diarization Error Rate (DER).

diarization machine-learning metrics speaker-diarization speech-processing speech-recognition

Last synced: 07 Nov 2024

https://github.com/clement-pages/gryannote

Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.

annotation-processing annotation-tool audio gradio gradio-custom-component interspeech2024 pyannote speaker-diarization speech-processing

Last synced: 17 Jan 2025

https://github.com/vectominist/spin

Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"

clustering disentanglement self-supervised-learning speech-processing speech-recognition

Last synced: 02 Dec 2024

https://github.com/declare-lab/speech-adapters

Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech understanding

adapter asr speech-processing speech-recognition speech-synthesis speech-to-text tts

Last synced: 08 Nov 2024

https://github.com/ardauzunoglu/rte-speech-generator

Natural Language Processing to generate new speeches for the President of Turkey.

natural-language-processing nlp politics python speech-processing tensorflow turkce turkish turkish-nlp

Last synced: 12 Nov 2024

https://github.com/k2kobayashi/shifter

Pitch shifter using WSOLA and resampling implemented by Python3

signal-processing speech speech-processing voice-control voice-conversion

Last synced: 03 Dec 2024

https://github.com/bunyaminergen/callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

denoising diarization forced-alignment llama3 llm openai opensource sentiment-analysis speech-emotion-recognition speech-processing speech-recognition speech-to-text summary topic-modeling transcription voice-activity-detection voice-recognition

Last synced: 08 Jan 2025

https://github.com/ryota-komatsu/speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

self-supervised-learning speech speech-processing

Last synced: 14 Jan 2025

https://github.com/r9y9/world.jl

A lightweight julia wrapper for WORLD - a high-quality speech analysis, modification and synthesis system

julia julia-wrapper speech-processing

Last synced: 03 Dec 2024

https://github.com/tabahi/formantanalyzer.js

Extract formant features such as frequency, power, energy, and bandwidth of formants at syllable or word level from audio sources in a web browser using WebAudio API.

audio-analysis audio-processing feature feature-engineering feature-extraction formant formant-detection music music-visualizer signal-processing spectrum-analyzer speech-processing

Last synced: 20 Dec 2024

https://github.com/bhattbhavesh91/wav2vec2-huggingface-demo

Speech to Text with self-supervised learning based on wav2vec 2.0 framework using Hugging Face's Transformer

facebook-wav2vec self-supervised-learning speech speech-processing speech-recognition speech-to-text unsupervised-learning wav2vec

Last synced: 16 Nov 2024

https://github.com/liamdugan/speech-to-speech

Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"

simultaneous-translation speech speech-processing speech-to-speech speech-translation

Last synced: 27 Oct 2024

https://github.com/farzadforuozanfar/speech-recognition

I recorded 10 voices with the same words from myself and compared them with another 10 words from another person. I was able to find a threshold level that acknowledges and recognizes my own voice.

distance dtw dtw-algorithm jupyter-notebook python3 speech-processing speech-recognition speech-to-text

Last synced: 25 Nov 2024

https://github.com/ringabout/scim

[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.

arraymancer audio digital-signal-processing mfcc nim scientific-computing speech-analysis speech-processing speech-recognition wav

Last synced: 25 Nov 2024

https://github.com/r9y9/melgeneralizedcepstrums.jl

Mel-Generalized Cepstrum analysis

julia speech-processing

Last synced: 03 Dec 2024

https://github.com/shunsukeaihara/pyssp

python speech signal processing library

python2 python3 signal-processing speech-processing

Last synced: 07 Nov 2024

https://github.com/inevolin/discordspeechbot

A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.

discord discord-bot discord-js music music-player speech speech-processing speech-recognition speech-to-text stt

Last synced: 12 Nov 2024

https://github.com/alecokas/bilatticernn-confidence

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks https://arxiv.org/abs/1910.11933 or https://ieeexplore.ieee.org/document/9053264

asr attention confidence-estimates confidence-estimation confidence-scores confusion-networks lattice latticernn lattices low-resource-languages lstm pytorch pytorch-implementation speech-processing speech-recognition

Last synced: 13 Oct 2024

https://github.com/wq2012/vb_diarization

VB Diarization with Eigenvoice and HMM Priors, refactored

machine-learning speaker-diarization speech-processing speech-recognition

Last synced: 14 Oct 2024

https://github.com/pprablanc/ppsrt

A python algorithm to change the pitch of the voice in real time

lpc pitch pitch-shift python python-algorithm real-time signal-processing speech-processing voice

Last synced: 10 Nov 2024

https://github.com/r9y9/sptk.jl

A thin Julia wrapper for Speech Signal Processing Toolkit (SPTK) API

julia julia-wrapper speech-processing

Last synced: 03 Dec 2024

https://github.com/viig99/esolafast

Fast C++ implementation of ESOLA using KFRLib, can be used for online time-stretch augmentation during SpeechToText training.

asr esola kfr pybind11 python-bindings speech speech-augmentation speech-processing speech-recognition speech-to-text time-stretch

Last synced: 11 Nov 2024

https://github.com/buaadreamer/slpkiller

语音和自然语言处理学习/Learning Speech and Language Processing

ai language-processing nlp speech-processing

Last synced: 05 Jan 2025

https://github.com/mahtafetrat/manatts-persian-speech-dataset

ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset

Last synced: 06 Nov 2024

https://github.com/mastashake08/speech-kit

Simplifying the Speech Synthesis and Speech Recognition engines for Javascript. Listen for commands and perform callback actions, make the browser speak and transcribe your speech!

grammar grammar-rules speech speech-processing speech-recognition speech-synthesis speech-to-text

Last synced: 12 Nov 2024

https://github.com/hpez/emotion-recognition

Emotion recognition from speech - Android

ai android neural-networks speech-processing

Last synced: 24 Nov 2024

https://github.com/daanzu/py-silero-vad-lite

Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies

python speech speech-processing vad voice voice-activity-detection

Last synced: 08 Nov 2024