{"id":24016534,"url":"https://github.com/georgygospodinov/speech_course","last_synced_at":"2026-02-18T06:31:03.949Z","repository":{"id":192866258,"uuid":"686266784","full_name":"georgygospodinov/speech_course","owner":"georgygospodinov","description":"Deep Learning for Speech","archived":false,"fork":false,"pushed_at":"2024-12-29T10:07:32.000Z","size":37341,"stargazers_count":90,"open_issues_count":1,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-31T14:11:12.381Z","etag":null,"topics":["asr","deep-learning","keyword-spotting","self-supervised-learning","speaker-recognition","speech-recognition","tts"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/georgygospodinov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-02T08:09:19.000Z","updated_at":"2025-03-26T15:25:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"7183f86e-7743-4f66-85da-4b5b90aeadeb","html_url":"https://github.com/georgygospodinov/speech_course","commit_stats":null,"previous_names":["georgygospodinov/speech_course"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgygospodinov%2Fspeech_course","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgygospodinov%2Fspeech_course/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgygospodinov%2Fspeech_course/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgygospodinov%2Fspeech_course/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/georgygospodinov","download_url":"https://codeload.github.com/georgygospodinov/speech_course/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247675597,"owners_count":20977376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","deep-learning","keyword-spotting","self-supervised-learning","speaker-recognition","speech-recognition","tts"],"created_at":"2025-01-08T08:51:41.027Z","updated_at":"2026-02-18T06:31:03.942Z","avatar_url":"https://github.com/georgygospodinov.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# speech_course\n\nMIPT, autumn 2025\n\n| # | Date | Description | Materials |\n|---------|------|-------------|---------|\n| 1 | 10.09 | Introduction. Speech Processing Tasks | [slides](https://docs.google.com/presentation/d/17eHV-M9BJwHrLgCtMiyBgA5Vm96jaDDs62RC2s3GD-M), [recording](https://youtu.be/BB445XwXwEU) |\n| 2 | 17.09 | Digital Signal Processing, RIR, AEC | [slides](https://docs.google.com/presentation/d/1Jl4uBhqN4GKE79r52xRNMPzElIQmVI7ckgeH2hy3sKo), [recording](https://youtu.be/TaMwhFnQe-c), [seminar](https://colab.research.google.com/github/georgygospodinov/speech_course/blob/main/week02/dsp_basics.ipynb) |\n| 3 | 24.09 | STFT, Keyword Spotting | [slides](https://docs.google.com/presentation/d/1f53twYUY__edWL3Ny48mO4ef5YCqdVSF8VkTGvsQKzg), [recording](https://youtu.be/zaoVdVQVxfg), [seminar](./week03/), **[HW](./week03/kws/)** |\n| 4 | 01.10 | Speech Recognition: CTC, Beam Search, Rescoring | [slides](https://docs.google.com/presentation/d/1RDpUIu2EaheE_MmKNUb8m65FocFiFhSjgTa_EfxREHE), [recording](https://youtu.be/2shAMBK4ASY) |\n| 5 | 08.10 | Speech Recognition: Encoder-Decoder, Streaming, RNN-T, Decoder-only | [slides](https://docs.google.com/presentation/d/1ZAepHIe7ME8Vh9PKcVt8-0XLsezjOre-0_rJdPChZa0), [recording](https://youtu.be/_ouCYN4y4fk), [seminar](https://colab.research.google.com/drive/1t0R7uAttkXFytv4CkFHMfaja7SHY9GNy?usp=sharing#scrollTo=WJFBF2caa_PB), **[HW](./week05/README.md)** |\n| 6 | 15.10 | Self-Supervised Learning: wav2vec2.0, HuBERT, BEST-RQ, GigaAM | [slides](https://docs.google.com/presentation/d/16CyQ7_qoN_vYhDPoO8lbkNZsMf-zaCIjJP3EvXqdois), [recording](https://youtu.be/_MFJ-EAuSZI) |\n| 7 | 22.10 | Speech Recognition: Semi-Supervised Learning, Data | [slides](https://docs.google.com/presentation/d/1uRfIOfiwu4XKUnIEhdYhbYTlqKVm6jj1WZHlQctMfEw/edit?slide=id.g38631e327a6_1_0), [recording](https://youtu.be/p54HHhDSmm8), [seminar](https://colab.research.google.com/github/georgygospodinov/speech_course/blob/main/week07/seminar.ipynb) |\n| 8 | 29.10 | Speaker Recognition | [slides](https://docs.google.com/presentation/d/1NM7VWeVGk_25aCQ2XGKQHag8HpBNVTPD7BNFo0OsfM4), [recording](https://youtu.be/WsspMkXG6Ys), [seminar](./week08/visualize.ipynb), **[HW](./week08/README.md)** |\n| 9 | 05.11 | Voice Activity Detection, Speaker Diarization, Speaker-attributed ASR | [slides](https://docs.google.com/presentation/d/1-ZRfxRXtpGYzzkaYEYIuM_O-_c_9K3Wc1vqHqvtmvq0), [recording](https://youtu.be/jUOGia7f6Y8), [seminar](./week09/pyannote_diarization_seminar.ipynb) |\n| 10 | 12.11 | Audio-Conditioned LLMs | [slides](https://docs.google.com/presentation/d/1kTkS9tHV6RUWlqP7LMmfRVZfn0-cfGcF-yy5iG0eJnw/), [recording](https://www.youtube.com/watch?v=VNg9bmF9bK0\u0026t=11s), [seminar](./week10/audiollm-seminar.ipynb), **[HW](./week10/audiollm-hw.ipynb)** |\n| 11 | 19.11 | Text-to-Speech: Conventional Models | [slides](./week11/tts_intro.pdf), [recording](https://youtu.be/K8Vwo7EESsM) |\n| 12 | 26.11 | Text-to-Speech: Codecs, Vocoders | [slides](https://docs.google.com/presentation/d/1f1ndayUbbz6xsElIMnyTEuID89W1xEVS6eXejr7X_YY), [seminar](https://colab.research.google.com/drive/1K3wH5LAREuHKBo3LX1lVkrEfEEtRG_zs) |\n| 13 | 03.12 | Text-to-Speech: Recent Advancements | [slides](./week13/tts_recent_advancements.pdf), [recording](https://youtu.be/yqCB3bVXIlg), **[HW](./week13/README.md)** |\n| 14 | 17.12 | Speech-to-Speech LLMs | [slides](https://docs.google.com/presentation/d/1T3OHOw880vgbhe_p_FxQ3Jt5DLuVBdFtQFpDdfCyz2U), [recording](https://youtu.be/O0QAvmAwtLI) |\n\n## Previous versions\n* [2024](https://github.com/georgygospodinov/speech_course/tree/2024)\n* [2023](https://github.com/georgygospodinov/speech_course/tree/2023)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorgygospodinov%2Fspeech_course","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeorgygospodinov%2Fspeech_course","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorgygospodinov%2Fspeech_course/lists"}