Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-audio-speech
Awesome list of Audio, Speech, and DSP(Digital signal processing)
https://github.com/KennethanCeyer/awesome-audio-speech
Last synced: 5 days ago
JSON representation
-
Recognition
- Deep Speech (Baidu Research)
- Deep Speech 2 (Baidu Research)
- Google Speech-to-Text
- Amazon Transcribe
- PocketSphinx (CMU Sphinx)
- SpeechKit (Yandex)
- Transformer-based Acoustic Modeling for Hybrid Speech Recognition
- Whisper - OpenAI's robust speech recognition system.
- Kaldi Speech Recognition Toolkit
- Whisper X - An extension of OpenAI's Whisper.
- DistilWhisper - Hugging Face's distilled version of Whisper.
-
Filtering / Denoising
- Fast Fourier Transform (FFT)
- Short-Time Fourier Transform (STFT)
- Adaptive filtering
- Least Mean Squares (LMS) algorithm
- Kalman filter
- Wiener filter
- Blind source separation (BSS)
- Non-negative matrix factorization (NMF)
- Infinite Impulse Response (IIR) filter
- Finite Impulse Response (FIR) filter
- Spectral subtraction
-
Diarization
- Fully Supervised Speaker Diarization - A novel approach to speaker diarization using fully supervised learning.
- NVIDIA's Speaker Diarization - NVIDIA's advanced approach to speaker diarization.
- Fully Supervised Speaker Diarization - A novel approach to speaker diarization using fully supervised learning.
- NVIDIA's Speaker Diarization - NVIDIA's advanced approach to speaker diarization.
- Speaker Diarization with LSTM - A paper on using LSTM networks for speaker diarization.
-
Synthesis
-
Open source projects
- SoX - A cross-platform audio processing tool that provides a command-line interface for converting, editing, and playing audio files.
- DeepSpeech - A speech-to-text engine developed by Mozilla Research.
- librosa - A library for audio and music analysis in Python, providing functions for computing features, such as MFCCs, chroma, and beat-related features.
- Audacity - A cross-platform audio editor and recorder that supports many formats and provides a user-friendly interface.
- PulseAudio - A cross-platform sound server for Linux, Unix, and Windows systems that provides sound server functionality to other applications.
- PyTorch Audio - A library that provides a PyTorch-based implementation of common audio functions, such as spectrogram computation, audio pre-processing, and spectrogram-based features.
-
Research papers
-
Blog posts
-
Books
- Digital Signal Processing: Principles, Algorithms, and Applications by John G. Proakis and Dimitris K Manolakis
- Signals and Systems by Alan V. Oppenheim and Alan S. Willsky
- The Scientist and Engineer's Guide to Digital Signal Processing by Steven W. Smith
- Discrete-Time Signal Processing by Alan V. Oppenheim and Ronald W. Schafer
- DSP First: A Multimedia Approach by James H. McClellan and Ronald W. Schafer
- Adaptive Filter Theory by Simon Haykin
Categories
Sub Categories
Keywords
speech-recognition
5
audio
4
speech-to-text
3
speech
3
deep-learning
2
machine-learning
2
python
2
whisper
2
pytorch
2
kaldi
2
signal-processing
1
noise-reduction
1
asr
1
speaker-verification
1
speaker-id
1
shell
1
cuda
1
c-plus-plus
1
tensorflow
1
on-device
1
offline
1
neural-networks
1
embedded
1
deepspeech
1
voice-conversion
1
text-to-speech
1
spoken-language-understanding
1
speech-translation
1
speech-synthesis
1
speech-separation
1
speech-enhancement
1
speaker-diarization
1
singing-voice-synthesis
1
machine-translation
1
end-to-end
1
chainer
1
io
1
audio-processing
1
wxwidgets-applications
1
gplv2
1
editor
1
cross-platform
1
scipy
1
music
1
librosa
1
dsp
1