Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vaibhavs10/ml-with-audio

HF's ML for Audio study group
https://github.com/vaibhavs10/ml-with-audio

huggingface speech-recognition speech-synthesis

Last synced: 4 days ago
JSON representation

HF's ML for Audio study group

Awesome Lists containing this project

README

        

# Hugging Face Machine Learning for Audio Study Group

Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and discussions, we'll explore the field of applying Machine Learning in the Audio domain. Some examples of this are:
* Generating synthetic sound out of a given text (think of conversational assistants)
* Transcribing audio signals to text.
* Removing noise out of an audio.
* Separating different sources of audio.
* Identifying which speaker is talking.
* And much more!

We suggest you to join the community Discord at http://hf.co/join/discord, and we're looking forward to meet at the #ml-4-audio-study-group channel 🤗. Remember, this is a community effort so make out of this your space!

## Organisation

We'll kick off with some basics and then collaboratively decide the further direction of the group.

Before each session:
* Read/watch related resources

During each session, you can
* Ask question in the forum
* Present a short (~10-15mins) presentation on the topic (agree beforehand)

Before/after:
* Keep discussing/asking questions about the topic (#ml-4-audio-study channel on discord)
* Share interesting resources

## Schedule

| Date | Topics | Resources (To read before) |
|--------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Dec 14, 2021 | Kickoff + Overview of Audio related usecases ([video](https://www.youtube.com/watch?v=cAviRhkqdnc&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-kick-off-dec-14/12745))| [The 3 DL Frameworks for e2e Speech Recognition that power your devices](https://heartbeat.comet.ml/the-3-deep-learning-frameworks-for-end-to-end-speech-recognition-that-power-your-devices-37b891ddc380) |
| Dec 21, 2021 |


  • Intro to Audio

  • Automatic Speech Recognition Deep Dive

([video](https://www.youtube.com/watch?v=D-MH6YjuIlE&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-intro-to-audio-and-asr-dec-21/12890))|

  • [Intro to Audio for FastAI Sections 1 and 2](https://nbviewer.org/github/fastaudio/fastaudio/blob/master/docs/Introduction%20to%20Audio.ipynb)

  • [Speech and Language Processing 26.1-26.5](https://web.stanford.edu/~jurafsky/slp3/)
  • |
    | Jan 4, 2022 | Text to Speech Deep Dive ([video](https://www.youtube.com/watch?v=aLBedWj-5CQ&ab_channel=HuggingFace), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-text-to-speech-deep-dive-jan-4/13315)) |

    • [Intro to Audio & ASR Notebooks](https://github.com/Vaibhavs10/ml-with-audio/tree/master/notebooks/session2)

    • [Speech and Language Processing 26.6](https://web.stanford.edu/~jurafsky/slp3/)
    • |
      | Jan 18, 2022 | pyctcdecode: A simple & fast STT prediction decoding algorithm ([demo](https://github.com/rhgrossman/pyctcdecode_demo), [slides](https://docs.google.com/presentation/d/1pjp8kTGChsr58D7Z2eVo9S7CsppMXNgZOApJo-rJ1As/edit#slide=id.g10e9c4afc9e_0_984), [questions](https://discuss.huggingface.co/t/ml-for-audio-study-group-pyctcdecode-jan-18/13561)) |

      • [Beam search CTC decoding](https://towardsdatascience.com/beam-search-decoding-in-ctc-trained-neural-networks-5a889a3d85a7)

      • [pyctcdecode](https://blog.kensho.com/pyctcdecode-a-new-beam-search-decoder-for-ctc-speech-recognition-2be3863afa96)
      • |

        ## Supplementary Resources

        In case you want to solidify a concept, or just want to go down further deep into the speech processing rabbit-hole.
        ### General Resources
        * Slides from LSA352: [Slides](https://nlp.stanford.edu/courses/lsa352/) (no videos available)
        * Slides from CS224S (Latest): [Slides](http://web.stanford.edu/class/cs224s/syllabus/) (no videos available)
        * Speech & Language Processing Book (Chapters 25 & 26) - [E-book](https://web.stanford.edu/~jurafsky/slp3/)

        ### Research Papers
        * Speech Recognition Papers: [Github repo](https://github.com/wenet-e2e/speech-recognition-papers)
        * Speech Synthesis Papers: [Github repo](https://github.com/xcmyz/speech-synthesis-paper)

        ### Toolkits
        * Speechbrain - [Github repo](https://speechbrain.github.io/)
        * Toucan - [Github repo](https://github.com/DigitalPhonetics/IMS-Toucan)
        * ESPnet - [Github repo](https://github.com/espnet/espnet)

        ## Demos
        * Add interesting effects to your audio files - [Huggingface spaces](https://huggingface.co/spaces/akhaliq/steerable-nafx)
        * Generate Speech from text (TTS) - [Huggingface spaces](https://huggingface.co/spaces/akhaliq/coqui-ai-tts)
        * Generate text from Speech (ASR) - [Huggingface spaces](https://huggingface.co/spaces/facebook/XLS-R-2B-22-16)