https://github.com/georgygospodinov/speech_course
Deep Learning for Speech
https://github.com/georgygospodinov/speech_course
asr deep-learning keyword-spotting self-supervised-learning speaker-recognition speech-recognition tts
Last synced: 4 months ago
JSON representation
Deep Learning for Speech
- Host: GitHub
- URL: https://github.com/georgygospodinov/speech_course
- Owner: georgygospodinov
- Created: 2023-09-02T08:09:19.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-12-29T10:07:32.000Z (over 1 year ago)
- Last Synced: 2025-03-31T14:11:12.381Z (over 1 year ago)
- Topics: asr, deep-learning, keyword-spotting, self-supervised-learning, speaker-recognition, speech-recognition, tts
- Language: Jupyter Notebook
- Homepage:
- Size: 35.6 MB
- Stars: 90
- Watchers: 4
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# speech_course
MIPT, autumn 2025
| # | Date | Description | Materials |
|---------|------|-------------|---------|
| 1 | 10.09 | Introduction. Speech Processing Tasks | [slides](https://docs.google.com/presentation/d/17eHV-M9BJwHrLgCtMiyBgA5Vm96jaDDs62RC2s3GD-M), [recording](https://youtu.be/BB445XwXwEU) |
| 2 | 17.09 | Digital Signal Processing, RIR, AEC | [slides](https://docs.google.com/presentation/d/1Jl4uBhqN4GKE79r52xRNMPzElIQmVI7ckgeH2hy3sKo), [recording](https://youtu.be/TaMwhFnQe-c), [seminar](https://colab.research.google.com/github/georgygospodinov/speech_course/blob/main/week02/dsp_basics.ipynb) |
| 3 | 24.09 | STFT, Keyword Spotting | [slides](https://docs.google.com/presentation/d/1f53twYUY__edWL3Ny48mO4ef5YCqdVSF8VkTGvsQKzg), [recording](https://youtu.be/zaoVdVQVxfg), [seminar](./week03/), **[HW](./week03/kws/)** |
| 4 | 01.10 | Speech Recognition: CTC, Beam Search, Rescoring | [slides](https://docs.google.com/presentation/d/1RDpUIu2EaheE_MmKNUb8m65FocFiFhSjgTa_EfxREHE), [recording](https://youtu.be/2shAMBK4ASY) |
| 5 | 08.10 | Speech Recognition: Encoder-Decoder, Streaming, RNN-T, Decoder-only | [slides](https://docs.google.com/presentation/d/1ZAepHIe7ME8Vh9PKcVt8-0XLsezjOre-0_rJdPChZa0), [recording](https://youtu.be/_ouCYN4y4fk), [seminar](https://colab.research.google.com/drive/1t0R7uAttkXFytv4CkFHMfaja7SHY9GNy?usp=sharing#scrollTo=WJFBF2caa_PB), **[HW](./week05/README.md)** |
| 6 | 15.10 | Self-Supervised Learning: wav2vec2.0, HuBERT, BEST-RQ, GigaAM | [slides](https://docs.google.com/presentation/d/16CyQ7_qoN_vYhDPoO8lbkNZsMf-zaCIjJP3EvXqdois), [recording](https://youtu.be/_MFJ-EAuSZI) |
| 7 | 22.10 | Speech Recognition: Semi-Supervised Learning, Data | [slides](https://docs.google.com/presentation/d/1uRfIOfiwu4XKUnIEhdYhbYTlqKVm6jj1WZHlQctMfEw/edit?slide=id.g38631e327a6_1_0), [recording](https://youtu.be/p54HHhDSmm8), [seminar](https://colab.research.google.com/github/georgygospodinov/speech_course/blob/main/week07/seminar.ipynb) |
| 8 | 29.10 | Speaker Recognition | [slides](https://docs.google.com/presentation/d/1NM7VWeVGk_25aCQ2XGKQHag8HpBNVTPD7BNFo0OsfM4), [recording](https://youtu.be/WsspMkXG6Ys), [seminar](./week08/visualize.ipynb), **[HW](./week08/README.md)** |
| 9 | 05.11 | Voice Activity Detection, Speaker Diarization, Speaker-attributed ASR | [slides](https://docs.google.com/presentation/d/1-ZRfxRXtpGYzzkaYEYIuM_O-_c_9K3Wc1vqHqvtmvq0), [recording](https://youtu.be/jUOGia7f6Y8), [seminar](./week09/pyannote_diarization_seminar.ipynb) |
| 10 | 12.11 | Audio-Conditioned LLMs | [slides](https://docs.google.com/presentation/d/1kTkS9tHV6RUWlqP7LMmfRVZfn0-cfGcF-yy5iG0eJnw/), [recording](https://www.youtube.com/watch?v=VNg9bmF9bK0&t=11s), [seminar](./week10/audiollm-seminar.ipynb), **[HW](./week10/audiollm-hw.ipynb)** |
| 11 | 19.11 | Text-to-Speech: Conventional Models | [slides](./week11/tts_intro.pdf), [recording](https://youtu.be/K8Vwo7EESsM) |
| 12 | 26.11 | Text-to-Speech: Codecs, Vocoders | [slides](https://docs.google.com/presentation/d/1f1ndayUbbz6xsElIMnyTEuID89W1xEVS6eXejr7X_YY), [seminar](https://colab.research.google.com/drive/1K3wH5LAREuHKBo3LX1lVkrEfEEtRG_zs) |
| 13 | 03.12 | Text-to-Speech: Recent Advancements | [slides](./week13/tts_recent_advancements.pdf), [recording](https://youtu.be/yqCB3bVXIlg), **[HW](./week13/README.md)** |
| 14 | 17.12 | Speech-to-Speech LLMs | [slides](https://docs.google.com/presentation/d/1T3OHOw880vgbhe_p_FxQ3Jt5DLuVBdFtQFpDdfCyz2U), [recording](https://youtu.be/O0QAvmAwtLI) |
## Previous versions
* [2024](https://github.com/georgygospodinov/speech_course/tree/2024)
* [2023](https://github.com/georgygospodinov/speech_course/tree/2023)