{"id":13562008,"url":"https://github.com/markovka17/dla","last_synced_at":"2025-05-15T08:04:55.771Z","repository":{"id":41485863,"uuid":"289688054","full_name":"markovka17/dla","owner":"markovka17","description":"Deep learning for audio processing","archived":false,"fork":false,"pushed_at":"2024-12-27T17:22:00.000Z","size":82903,"stargazers_count":636,"open_issues_count":4,"forks_count":111,"subscribers_count":25,"default_branch":"2024","last_synced_at":"2025-04-14T13:07:52.661Z","etag":null,"topics":["deep-learning","keyword-spotting","signal-processing","speaker-verification","speech-recognition","tts","voice-conversion"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/markovka17.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-23T13:07:56.000Z","updated_at":"2025-04-11T14:25:29.000Z","dependencies_parsed_at":"2025-01-04T08:05:22.505Z","dependency_job_id":"2caaf774-ad65-49f4-9fa8-82a55fe0bc84","html_url":"https://github.com/markovka17/dla","commit_stats":{"total_commits":154,"total_committers":12,"mean_commits":"12.833333333333334","dds":0.4415584415584416,"last_synced_commit":"a11ea9668413fe8a933f1650103fa23d6be77043"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markovka17%2Fdla","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markovka17%2Fdla/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markovka17%2Fdla/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markovka17%2Fdla/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/markovka17","download_url":"https://codeload.github.com/markovka17/dla/tar.gz/refs/heads/2024","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254301422,"owners_count":22047901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","keyword-spotting","signal-processing","speaker-verification","speech-recognition","tts","voice-conversion"],"created_at":"2024-08-01T13:01:03.556Z","updated_at":"2025-05-15T08:04:55.682Z","avatar_url":"https://github.com/markovka17.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Learning"],"sub_categories":["Conferences"],"readme":"![logo5v1](https://user-images.githubusercontent.com/20357655/104316876-2be04600-54ee-11eb-93ed-f9835fde1527.jpg)\n\n# Deep Learning for Audio (DLA)\n\n- Lecture and seminar materials for each week are in `./week*` folders, see `README.md` for materials and instructions\n- Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue\n- The current version of the course is conducted in **autumn 2024** at the [CS Faculty](https://cs.hse.ru/en/) of [HSE](https://www.hse.ru/en/).\n\nFor previous years versions, see [Past Versions](#past-versions) section.\n\n# Syllabus\n\n- [**week01**](./week01) Introduction to Course\n\n  - Lecture: Introduction to Course\n  - Seminar: Experiment tracking, `Hydra`, `Git`, `VS code`\n  - Self-Study: Introduction to `PyTorch`\n\n- [**week02**](./week02) Introduction to Digital Signal Processing\n\n  - Lecture: Signals, Fourier Transform, spectrograms, MelScale, MFCC\n  - Seminar: DSP in practice, spectrogram creation, IRF, frequency filtering\n\n- [**week03**](./week03) Speech Recognition I\n\n  - Lecture: Metrics, Datasets, Connectionist Temporal Classification (CTC), Classic Models, Beam Search, Language models\n  - Seminar: Audio Augmentations, Beam Search\n  - Q\u0026A Session: Homework discussion, R\u0026D coding tips\n\n- [**week04**](./week04) Speech Recognition II\n\n  - Lecture: LAS, RNN-T, Language models for RNN-T and LAS\n  - Seminar: Hybrid RNN-T and CTC model training and inference\n\n- [**week05**](./week05) Guest Lecture. Speech Recognition III and Audio SSL\n\n  - Lecture: Self-Supervised Models for Audio, Audio LLMs\n\n- [**week06**](./week06) Source Separation I\n\n  - Lecture: A review of general Source Separation and Denoising, Encoder-Decoder-Separator architectures, Demucs family, DCCRN, FullSubNet+, BandSplitRNN\n  - Seminar: Metrics\n\n- [**week07**](./week07) Source Separation II\n\n  - Lecture: Speech separation, Blind and Target Separation, Recurrent(TasNet, DPRNN, VoiceFilter) and CNN(ConvTasNet, SpEx+)\n  - Seminar: WienerFilter, SincFilter and DEMUCS; streaming processing and performance metrics\n\n- [**week08**](./week08) Audio-Visual Deep Learning\n\n  - Lecture: Audio-Visual Fusion, Source Separation, Speech Recognition, and Self-Supervised Models. Wav2Lip and SadTalker (talking face)\n  - Q\u0026A: Project and Slurm discussion\n  - Extra Seminar: Create Your Own Intelligent Voice Assistant\n\n- [**week09**](./week09) Text to Speech (TTS)\n\n  - Lecture: Tacotron, DeepVoice, GST, FastSpeech, AdaSpeech, Attention Tricks\n  - Seminar: postponed\n\n- [**week10**](./week10) Neural Vocoders\n\n  - Lecture: WaveNet, Parallel WaveGAN, WaveGlow, MelGAN, HiFiGAN\n  - Seminar: FastSpeech I, TTS pipeline: from text to audio\n\n- [**week11**](./week11) Diffusion-based TTS\n\n  - Lecture: Diffusion concept. Diffusion Vocoders and Diffusion acoustic models.\n\n- [**week12**](./week12) Voice Biometry I\n\n  - Lecture: Introduction. Reverberation. CMs for recorded and synthesized speech detection (LCNN, RawNet2, AASIST). GNNs\n  - Seminar: ASVspoof, Sinc-layer, GNN\n\n- [**week13**](./week13) Voice Biometry II\n\n  - Guest Lecture: Kolmogorov-Arnold Networks (KANs), AASIST3, ASVspoof5\n  - Lecture: ASV systems. SASV systems. Streaming\n\n- [**week14**](./week14) AI for Music\n\n  - Lecture: Tasks overview, Music Information Retrieval, Music Generation\n\n\u003c!--\n--\u003e\n\n# Homeworks and Projects\n\n- [**HW_ASR**](./hw1_asr) Training a speech recognition model\n- [**Project_AVSS**](./project_avss) Training an audio-visual speech separation model\n- [**HW_NV**](./hw3_nv) Implementation of a TTS model (Neural Vocoder)\n\u003c!--\n  --\u003e\n\nSee our [project template](https://github.com/Blinorot/pytorch_project_template).\n\n# Resources\n\n- [Lecture recordings on YouTube (in russian)](https://youtube.com/playlist?list=PLYG3WHDP5CWVRxLjXZbllqIQTWY_QjKmz)\n\nSome of the weeks have English recordings. See the corresponding sub-directories.\n\n# Contributors \u0026 course staff\n\nCourse materials and teaching (in different years) were delivered by:\n\n- [Maxim Kaledin](https://t.me/XuMuK_MK)\n- [Petr Grinberg](https://t.me/Blinorot)\n- [Grigory Fedorov](https://t.me/fedorovgv)\n- [Aibek Alanov](https://t.me/aibrain)\n- [Alexander Markovich (previously)](https://t.me/markovka17)\n- [Daniil Ivanov (previously)](https://t.me/the_longest_id_in_the_world)\n- [Ilya Lewin (previously)](https://t.me/levensons)\n- [Timofey Smirnov (previously)](https://t.me/timothyxp)\n- [Alexander Mamaev (previously)](https://t.me/alxmamaev)\n\n# Past Versions\n\n- [2023](https://github.com/markovka17/dla/tree/2023)\n- [2022](https://github.com/markovka17/dla/tree/2022)\n- [2021](https://github.com/markovka17/dla/tree/2021)\n- [2020](https://github.com/markovka17/dla/tree/2020)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkovka17%2Fdla","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarkovka17%2Fdla","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkovka17%2Fdla/lists"}