https://github.com/georgygospodinov/speech_course
Deep Learning for Speech
https://github.com/georgygospodinov/speech_course
asr deep-learning keyword-spotting self-supervised-learning speaker-recognition speech-recognition tts
Last synced: 2 months ago
JSON representation
Deep Learning for Speech
- Host: GitHub
- URL: https://github.com/georgygospodinov/speech_course
- Owner: georgygospodinov
- Created: 2023-09-02T08:09:19.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-29T10:07:32.000Z (6 months ago)
- Last Synced: 2025-03-31T14:11:12.381Z (3 months ago)
- Topics: asr, deep-learning, keyword-spotting, self-supervised-learning, speaker-recognition, speech-recognition, tts
- Language: Jupyter Notebook
- Homepage:
- Size: 35.6 MB
- Stars: 90
- Watchers: 4
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# speech_course
MIPT, autumn 2024
| # | Date | Description | Materials |
|---------|------|-------------|---------|
| 1 | 11.09 | Introduction. Speech Processing Tasks | [slides](https://docs.google.com/presentation/d/1O1u_UR3wiENdVztgVLJUZoKlSymoxDNixSM1Tm-CjrI), [recording](https://youtu.be/wKXmjXU1Qsc) |
| 2 | 18.09 | Digital Signal Processing | [slides](https://docs.google.com/presentation/d/1l32uxNB5orHhzqEiRn8yvuMegp6UW2a3n-tdqVTf0D8), [recording](https://youtu.be/5ApIUT_-eqw), [colab](https://colab.research.google.com/github/georgygospodinov/speech_course/blob/main/week02/dsp_basics.ipynb) |
| 3 | 25.09 | Keyword Spotting | [slides](https://docs.google.com/presentation/d/1G1QaEsOaXVMaQkdYE9EQO8rXxbQLsFOxrHpDLLTjrok), [recording](https://youtu.be/zWeEctvTyzA), [seminar](./week03/), **[HW](./week03/kws/)** |
| 4 | 02.10 | Speech Recognition: CTC, Beam Search, Rescoring | [slides](https://docs.google.com/presentation/d/1z3r5GIgWKBkDNXW7TVrA5gCQMLwLylxGcVlUrhg8k0M/edit?usp=drive_web&ouid=109922422742355126005), [recording](https://youtu.be/SIJ3YumuxBs), [seminar](./week04/seminar_notebook.ipynb), **[HW](./week04/HW.md)** |
| 5 | 09.10 | Speech Recognition: Streaming, RNN-T, LAS | [slides](https://docs.google.com/presentation/d/1R-ynTzomYmGzHbnk-oHNcanqEizryrXTnksI9IobZ0E), [recording](https://youtu.be/OpplQTEbHV0), [seminar](./week05/conformer_las.ipynb) |
| 6 | 16.10 | SSL & System Combination | [slides](https://docs.google.com/presentation/d/1gFQ1-p27irwMSN0Qov_cZ5hREGNl2Mdach_KwdkwhXE/edit?usp=sharing), [recording](https://youtu.be/K-9CCv8dBeU), **[HW](./week06/asr_ensemble.ipynb)** |
| 7 | 23.10 | Speech Recognition: Semi-Supervised Learning, mWER | [slides](./week07/l7_asr_semi_supervised.pdf), [recording](https://www.youtube.com/watch?v=Xv_s72oSku8), **[HW](https://colab.research.google.com/drive/19DXSYuoD8v3ocE_NZKPpuqBqZqYHuOu9?usp=sharing)** |
| 8 | 30.10 | Speech Recognition: Data | [slides](https://docs.google.com/presentation/d/1iKU0xCRzHnx1fBxTb1DWib-NN0P7pF6rlE0MgsoSZ9I), [recording](https://youtu.be/N_UASNT4V-4) |
| 9 | 06.11 | Speaker Recognition | [slides](https://docs.google.com/presentation/d/1xMr1tUD0qNq-6A6-1MQvQNa3d08t95fzG20DUBh8Ruo), [recording](https://youtu.be/V2N4SY4eXS0), [seminar](./week09/visualize.ipynb) |
| 10 | 13.11 | Voice Activity Detection, Speaker Diarization | [slides](https://docs.google.com/presentation/d/1e_i4_5RT4BlTilVfnwFPGjFa2kHxesaSW7kXHx67Mtk), [recording](https://youtu.be/V38kQQwH-dQ), [seminar](./week10/pyannote_diarization_seminar.ipynb) |
| 11 | 20.11 | Text To Speech: Introduction | [slides](./week11/tts_intro_metrics.pdf), [recording](https://youtu.be/z6SPvCi-J7A), [seminar](./week11/seminar.ipynb) |
| 12 | 27.11 | Text To Speech: Acoustic Models, Tacotron2 | [slides](./week12/tts_am_taco.pdf), [recording](https://youtu.be/OMSm9pZdNzE), [seminar](./week12/seminar.ipynb) |
| 13 | 04.12 | Text To Speech: Vocoders | [slides](./week13/tts_vocoders.pdf), [recording](https://youtu.be/yxJVMCE7tHg), **[HW](./week13/HW.md)** |
| 14 | 11.12 | Audio Large Language Models | [slides](https://docs.google.com/presentation/d/1C0WOJsHMWsUk8rjQfGDKAPG0iOsYb2Vhy6zqn2CSz_U), [recording](https://youtu.be/fTczeiWK3NM) |## Previous versions
* [2023](https://github.com/georgygospodinov/speech_course/tree/2023)