Projects in Awesome Lists tagged with vad

https://github.com/smacke/ffsubsync

Automagically synchronize subtitles with video.

alignment audio caption captions fast-fourier-transform ffmpeg fft speech-detection srt srt-subtitles string-alignment subtitle subtitles sync synchronization vad video vlc vlc-media-player voice-activity-detection

Last synced: 29 Sep 2024

https://github.com/smacke/subsync

Automagically synchronize subtitles with video.

alignment audio caption captions fast-fourier-transform ffmpeg fft speech-detection srt srt-subtitles string-alignment subtitle subtitles sync synchronization vad video vlc vlc-media-player voice-activity-detection

Last synced: 04 Aug 2024

https://github.com/modelscope/funasr

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

audio-visual-speech-recognition conformer dfsmn paraformer pretrained-model punctuation pytorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection whisper

Last synced: 26 Sep 2024

https://github.com/snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

onnx onnx-runtime onnxruntime pytorch speech speech-processing vad voice-activity-detection voice-commands voice-control voice-detection voice-recognition

Last synced: 01 Aug 2024

https://github.com/cheshirecc/faster-whisper-gui

faster_whisper GUI with PySide6

asr faster-whisper openai transcribe vad voice-transcription whisper whisperx

Last synced: 28 Sep 2024

https://github.com/k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

asr c cpp csharp go kotlin python speech-recognition vad voice-activity-detection

Last synced: 30 Sep 2024

https://github.com/jtkim-kaist/VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

acam attention bdnn data dnn lstm speech speech-activity-detection speech-recognition vad voice-activity-detection voice-detection

Last synced: 03 Aug 2024

https://github.com/amsehili/auditok

An audio/acoustic activity detection and audio segmentation tool

audio-activities audio-data audio-segmentation vad voice-activity-detection voice-detection

Last synced: 31 Jul 2024

https://github.com/filippogiruzzi/voice_activity_detection

Voice Activity Detection based on Deep Learning & TensorFlow

artificial-intelligence deep-learning deep-neural-networks deeplearning librispeech librispeech-dataset machine-learning mfcc-features python resnet speech speech-detection speech-recognition tensorflow time-series time-series-classification vad voice-activity-detection

Last synced: 03 Aug 2024

https://github.com/gtreshchev/RuntimeAudioImporter

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

audio audio-converter audio-files audio-formats audio-player bink blueprints mp3 mp3-player plugin ue4 ue4-plugin ue5 ue5-plugin unreal-engine unreal-engine-4 unreal-engine-5 unrealengine vad voice-activity-detection

Last synced: 01 Aug 2024

https://github.com/shashikg/whispers2t

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper

Last synced: 26 Sep 2024

https://github.com/shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper

Last synced: 03 Aug 2024

https://github.com/DmitryRyumin/ICASSP-2023-24-Papers

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

asr denoising domain-adaptation face-recognition generative-models icassp icassp2023 icassp2024 image-generation keyword-spotting language-modeling multimodal-learning music-generation self-supervised-learning semantic-segmentation signal-processing signal-restoration speech-recognition spoken-language-understanding vad