Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vaibhavs10/ml-with-audio
HF's ML for Audio study group
https://github.com/vaibhavs10/ml-with-audio
huggingface speech-recognition speech-synthesis
Last synced: 4 days ago
JSON representation
HF's ML for Audio study group
- Host: GitHub
- URL: https://github.com/vaibhavs10/ml-with-audio
- Owner: Vaibhavs10
- Created: 2021-12-10T14:08:14.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-02-27T13:54:43.000Z (over 1 year ago)
- Last Synced: 2023-11-07T15:30:03.872Z (12 months ago)
- Topics: huggingface, speech-recognition, speech-synthesis
- Language: Jupyter Notebook
- Homepage:
- Size: 5.12 MB
- Stars: 150
- Watchers: 15
- Forks: 24
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hugging Face Machine Learning for Audio Study Group
Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and discussions, we'll explore the field of applying Machine Learning in the Audio domain. Some examples of this are:
* Generating synthetic sound out of a given text (think of conversational assistants)
* Transcribing audio signals to text.
* Removing noise out of an audio.
* Separating different sources of audio.
* Identifying which speaker is talking.
* And much more!We suggest you to join the community Discord at http://hf.co/join/discord, and we're looking forward to meet at the #ml-4-audio-study-group channel 🤗. Remember, this is a community effort so make out of this your space!
## Organisation
We'll kick off with some basics and then collaboratively decide the further direction of the group.
Before each session:
* Read/watch related resourcesDuring each session, you can
* Ask question in the forum
* Present a short (~10-15mins) presentation on the topic (agree beforehand)Before/after:
* Keep discussing/asking questions about the topic (#ml-4-audio-study channel on discord)
* Share interesting resources## Schedule
| Date | Topics | Resources (To read before) |
|--------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Dec 14, 2021 | Kickoff + Overview of Audio related usecases ([video](https://www.youtube.com/watch?v=cAviRhkqdnc&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-kick-off-dec-14/12745))| [The 3 DL Frameworks for e2e Speech Recognition that power your devices](https://heartbeat.comet.ml/the-3-deep-learning-frameworks-for-end-to-end-speech-recognition-that-power-your-devices-37b891ddc380) |
| Dec 21, 2021 |
- Intro to Audio
- Automatic Speech Recognition Deep Dive
- [Intro to Audio for FastAI Sections 1 and 2](https://nbviewer.org/github/fastaudio/fastaudio/blob/master/docs/Introduction%20to%20Audio.ipynb)
- [Speech and Language Processing 26.1-26.5](https://web.stanford.edu/~jurafsky/slp3/) |
- [Intro to Audio & ASR Notebooks](https://github.com/Vaibhavs10/ml-with-audio/tree/master/notebooks/session2)
- [Speech and Language Processing 26.6](https://web.stanford.edu/~jurafsky/slp3/) |
- [Beam search CTC decoding](https://towardsdatascience.com/beam-search-decoding-in-ctc-trained-neural-networks-5a889a3d85a7)
- [pyctcdecode](https://blog.kensho.com/pyctcdecode-a-new-beam-search-decoder-for-ctc-speech-recognition-2be3863afa96) |
| Jan 4, 2022 | Text to Speech Deep Dive ([video](https://www.youtube.com/watch?v=aLBedWj-5CQ&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-text-to-speech-deep-dive-jan-4/13315)) |
| Jan 18, 2022 | pyctcdecode: A simple & fast STT prediction decoding algorithm ([demo](https://github.com/rhgrossman/pyctcdecode_demo), [slides](https://docs.google.com/presentation/d/1pjp8kTGChsr58D7Z2eVo9S7CsppMXNgZOApJo-rJ1As/edit#slide=id.g10e9c4afc9e_0_984), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-pyctcdecode-jan-18/13561)) |
## Supplementary Resources
In case you want to solidify a concept, or just want to go down further deep into the speech processing rabbit-hole.
### General Resources
* Slides from LSA352: [Slides](https://nlp.stanford.edu/courses/lsa352/) (no videos available)
* Slides from CS224S (Latest): [Slides](http://web.stanford.edu/class/cs224s/syllabus/) (no videos available)
* Speech & Language Processing Book (Chapters 25 & 26) - [E-book](https://web.stanford.edu/~jurafsky/slp3/)
### Research Papers
* Speech Recognition Papers: [Github repo](https://github.com/wenet-e2e/speech-recognition-papers)
* Speech Synthesis Papers: [Github repo](https://github.com/xcmyz/speech-synthesis-paper)
### Toolkits
* Speechbrain - [Github repo](https://speechbrain.github.io/)
* Toucan - [Github repo](https://github.com/DigitalPhonetics/IMS-Toucan)
* ESPnet - [Github repo](https://github.com/espnet/espnet)
## Demos
* Add interesting effects to your audio files - [Huggingface spaces](https://huggingface.co/spaces/akhaliq/steerable-nafx)
* Generate Speech from text (TTS) - [Huggingface spaces](https://huggingface.co/spaces/akhaliq/coqui-ai-tts)
* Generate text from Speech (ASR) - [Huggingface spaces](https://huggingface.co/spaces/facebook/XLS-R-2B-22-16)