https://github.com/georgygospodinov/speech_course

Deep Learning for Speech
https://github.com/georgygospodinov/speech_course

asr deep-learning keyword-spotting self-supervised-learning speaker-recognition speech-recognition tts

Last synced: 5 months ago
JSON representation

Deep Learning for Speech

Host: GitHub
URL: https://github.com/georgygospodinov/speech_course
Owner: georgygospodinov
Created: 2023-09-02T08:09:19.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-12-29T10:07:32.000Z (over 1 year ago)
Last Synced: 2025-03-31T14:11:12.381Z (over 1 year ago)
Topics: asr, deep-learning, keyword-spotting, self-supervised-learning, speaker-recognition, speech-recognition, tts
Language: Jupyter Notebook
Homepage:
Size: 35.6 MB
Stars: 90
Watchers: 4
Forks: 8
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # speech_course

MIPT, autumn 2025

| # | Date | Description | Materials |

|---------|------|-------------|---------|

| 1 | 10.09 | Introduction. Speech Processing Tasks | [slides](https://docs.google.com/presentation/d/17eHV-M9BJwHrLgCtMiyBgA5Vm96jaDDs62RC2s3GD-M), [recording](https://youtu.be/BB445XwXwEU) |

| 2 | 17.09 | Digital Signal Processing, RIR, AEC | [slides](https://docs.google.com/presentation/d/1Jl4uBhqN4GKE79r52xRNMPzElIQmVI7ckgeH2hy3sKo), [recording](https://youtu.be/TaMwhFnQe-c), [seminar](https://colab.research.google.com/github/georgygospodinov/speech_course/blob/main/week02/dsp_basics.ipynb) |

| 3 | 24.09 | STFT, Keyword Spotting | [slides](https://docs.google.com/presentation/d/1f53twYUY__edWL3Ny48mO4ef5YCqdVSF8VkTGvsQKzg), [recording](https://youtu.be/zaoVdVQVxfg), [seminar](./week03/), **[HW](./week03/kws/)** |

| 4 | 01.10 | Speech Recognition: CTC, Beam Search, Rescoring | [slides](https://docs.google.com/presentation/d/1RDpUIu2EaheE_MmKNUb8m65FocFiFhSjgTa_EfxREHE), [recording](https://youtu.be/2shAMBK4ASY) |

| 5 | 08.10 | Speech Recognition: Encoder-Decoder, Streaming, RNN-T, Decoder-only | [slides](https://docs.google.com/presentation/d/1ZAepHIe7ME8Vh9PKcVt8-0XLsezjOre-0_rJdPChZa0), [recording](https://youtu.be/_ouCYN4y4fk), [seminar](https://colab.research.google.com/drive/1t0R7uAttkXFytv4CkFHMfaja7SHY9GNy?usp=sharing#scrollTo=WJFBF2caa_PB), **[HW](./week05/README.md)** |

| 6 | 15.10 | Self-Supervised Learning: wav2vec2.0, HuBERT, BEST-RQ, GigaAM | [slides](https://docs.google.com/presentation/d/16CyQ7_qoN_vYhDPoO8lbkNZsMf-zaCIjJP3EvXqdois), [recording](https://youtu.be/_MFJ-EAuSZI) |

| 7 | 22.10 | Speech Recognition: Semi-Supervised Learning, Data | [slides](https://docs.google.com/presentation/d/1uRfIOfiwu4XKUnIEhdYhbYTlqKVm6jj1WZHlQctMfEw/edit?slide=id.g38631e327a6_1_0), [recording](https://youtu.be/p54HHhDSmm8), [seminar](https://colab.research.google.com/github/georgygospodinov/speech_course/blob/main/week07/seminar.ipynb) |

| 8 | 29.10 | Speaker Recognition | [slides](https://docs.google.com/presentation/d/1NM7VWeVGk_25aCQ2XGKQHag8HpBNVTPD7BNFo0OsfM4), [recording](https://youtu.be/WsspMkXG6Ys), [seminar](./week08/visualize.ipynb), **[HW](./week08/README.md)** |

| 9 | 05.11 | Voice Activity Detection, Speaker Diarization, Speaker-attributed ASR | [slides](https://docs.google.com/presentation/d/1-ZRfxRXtpGYzzkaYEYIuM_O-_c_9K3Wc1vqHqvtmvq0), [recording](https://youtu.be/jUOGia7f6Y8), [seminar](./week09/pyannote_diarization_seminar.ipynb) |

| 10 | 12.11 | Audio-Conditioned LLMs | [slides](https://docs.google.com/presentation/d/1kTkS9tHV6RUWlqP7LMmfRVZfn0-cfGcF-yy5iG0eJnw/), [recording](https://www.youtube.com/watch?v=VNg9bmF9bK0&t=11s), [seminar](./week10/audiollm-seminar.ipynb), **[HW](./week10/audiollm-hw.ipynb)** |

| 11 | 19.11 | Text-to-Speech: Conventional Models | [slides](./week11/tts_intro.pdf), [recording](https://youtu.be/K8Vwo7EESsM) |

| 12 | 26.11 | Text-to-Speech: Codecs, Vocoders | [slides](https://docs.google.com/presentation/d/1f1ndayUbbz6xsElIMnyTEuID89W1xEVS6eXejr7X_YY), [seminar](https://colab.research.google.com/drive/1K3wH5LAREuHKBo3LX1lVkrEfEEtRG_zs) |

| 13 | 03.12 | Text-to-Speech: Recent Advancements | [slides](./week13/tts_recent_advancements.pdf), [recording](https://youtu.be/yqCB3bVXIlg), **[HW](./week13/README.md)** |

| 14 | 17.12 | Speech-to-Speech LLMs | [slides](https://docs.google.com/presentation/d/1T3OHOw880vgbhe_p_FxQ3Jt5DLuVBdFtQFpDdfCyz2U), [recording](https://youtu.be/O0QAvmAwtLI) |

## Previous versions

* [2024](https://github.com/georgygospodinov/speech_course/tree/2024)

* [2023](https://github.com/georgygospodinov/speech_course/tree/2023)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/georgygospodinov/speech_course

Awesome Lists containing this project

README