https://github.com/vaibhavs10/ml-with-audio

HF's ML for Audio study group
https://github.com/vaibhavs10/ml-with-audio

huggingface speech-recognition speech-synthesis

Last synced: 3 months ago
JSON representation

HF's ML for Audio study group

Host: GitHub
URL: https://github.com/vaibhavs10/ml-with-audio
Owner: Vaibhavs10
Created: 2021-12-10T14:08:14.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2023-02-27T13:54:43.000Z (over 2 years ago)
Last Synced: 2025-04-10T04:49:27.583Z (3 months ago)
Topics: huggingface, speech-recognition, speech-synthesis
Language: Jupyter Notebook
Homepage:
Size: 5.12 MB
Stars: 191
Watchers: 15
Forks: 29
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Hugging Face Machine Learning for Audio Study Group

Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and discussions, we'll explore the field of applying Machine Learning in the Audio domain. Some examples of this are:

* Generating synthetic sound out of a given text (think of conversational assistants)

* Transcribing audio signals to text.

* Removing noise out of an audio.

* Separating different sources of audio.

* Identifying which speaker is talking.

* And much more!

We suggest you to join the community Discord at http://hf.co/join/discord, and we're looking forward to meet at the #ml-4-audio-study-group channel 🤗. Remember, this is a community effort so make out of this your space!

## Organisation

We'll kick off with some basics and then collaboratively decide the further direction of the group.

Before each session: 

* Read/watch related resources

During each session, you can

* Ask question in the forum

* Present  a short (~10-15mins) presentation on the topic (agree beforehand)

Before/after:

* Keep discussing/asking questions about the topic (#ml-4-audio-study channel on discord)

* Share interesting resources

## Schedule

| Date         | Topics                                                    | Resources (To read before)                                                                                                                                                                                                           |

|--------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| Dec 14, 2021 | Kickoff + Overview of Audio related usecases ([video](https://www.youtube.com/watch?v=cAviRhkqdnc&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-kick-off-dec-14/12745))| [The 3 DL Frameworks for e2e Speech Recognition that power your devices](https://heartbeat.comet.ml/the-3-deep-learning-frameworks-for-end-to-end-speech-recognition-that-power-your-devices-37b891ddc380)                         |

| Dec 21, 2021 | 


 Intro to Audio 

Automatic Speech Recognition Deep Dive

 ([video](https://www.youtube.com/watch?v=D-MH6YjuIlE&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-intro-to-audio-and-asr-dec-21/12890))| 

 [Intro to Audio for FastAI Sections 1 and 2](https://nbviewer.org/github/fastaudio/fastaudio/blob/master/docs/Introduction%20to%20Audio.ipynb) 

 [Speech and Language Processing 26.1-26.5](https://web.stanford.edu/~jurafsky/slp3/) |

| Jan 4, 2022 | Text to Speech Deep Dive  ([video](https://www.youtube.com/watch?v=aLBedWj-5CQ&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-text-to-speech-deep-dive-jan-4/13315))                               | 

 [Intro to Audio & ASR Notebooks](https://github.com/Vaibhavs10/ml-with-audio/tree/master/notebooks/session2) 

 [Speech and Language Processing 26.6](https://web.stanford.edu/~jurafsky/slp3/)|

| Jan 18, 2022 | pyctcdecode: A simple & fast STT prediction decoding algorithm ([demo](https://github.com/rhgrossman/pyctcdecode_demo), [slides](https://docs.google.com/presentation/d/1pjp8kTGChsr58D7Z2eVo9S7CsppMXNgZOApJo-rJ1As/edit#slide=id.g10e9c4afc9e_0_984), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-pyctcdecode-jan-18/13561))                              | 

 [Beam search CTC decoding](https://towardsdatascience.com/beam-search-decoding-in-ctc-trained-neural-networks-5a889a3d85a7) 

 [pyctcdecode](https://blog.kensho.com/pyctcdecode-a-new-beam-search-decoder-for-ctc-speech-recognition-2be3863afa96)
|

## Supplementary Resources

In case you want to solidify a concept, or just want to go down further deep into the speech processing rabbit-hole.

### General Resources

* Slides from LSA352: [Slides](https://nlp.stanford.edu/courses/lsa352/) (no videos available)

* Slides from CS224S (Latest): [Slides](http://web.stanford.edu/class/cs224s/syllabus/) (no videos available)

* Speech & Language Processing Book (Chapters 25 & 26) - [E-book](https://web.stanford.edu/~jurafsky/slp3/)

### Research Papers

* Speech Recognition Papers: [Github repo](https://github.com/wenet-e2e/speech-recognition-papers)

* Speech Synthesis Papers: [Github repo](https://github.com/xcmyz/speech-synthesis-paper)

### Toolkits

* Speechbrain - [Github repo](https://speechbrain.github.io/)

* Toucan - [Github repo](https://github.com/DigitalPhonetics/IMS-Toucan)

* ESPnet - [Github repo](https://github.com/espnet/espnet)

## Demos

* Add interesting effects to your audio files - [Huggingface spaces](https://huggingface.co/spaces/akhaliq/steerable-nafx)

* Generate Speech from text (TTS) - [Huggingface spaces](https://huggingface.co/spaces/akhaliq/coqui-ai-tts)

* Generate text from Speech (ASR) - [Huggingface spaces](https://huggingface.co/spaces/facebook/XLS-R-2B-22-16)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vaibhavs10/ml-with-audio

Awesome Lists containing this project

README