Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/markovka17/dla

Deep learning for audio processing
https://github.com/markovka17/dla

deep-learning keyword-spotting signal-processing speaker-verification speech-recognition tts voice-conversion

Last synced: 21 days ago
JSON representation

Deep learning for audio processing

Awesome Lists containing this project

README

        

![logo5v1](https://user-images.githubusercontent.com/20357655/104316876-2be04600-54ee-11eb-93ed-f9835fde1527.jpg)

# Deep Learning for Audio (DLA)

- Lecture and seminar materials for each week are in `./week*` folders, see `README.md` for materials and instructions
- Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
- The current version of the course is conducted in **autumn 2024** at the [CS Faculty](https://cs.hse.ru/en/) of [HSE](https://www.hse.ru/en/).

For previous years versions, see [Past Versions](#past-versions) section.

# Syllabus

- [**week01**](./week01) Introduction to Course

- Lecture: Introduction to Course
- Seminar: Experiment tracking, `Hydra`, `Git`, `VS code`
- Self-Study: Introduction to `PyTorch`

- [**week02**](./week02) Introduction to Digital Signal Processing

- Lecture: Signals, Fourier Transform, spectrograms, MelScale, MFCC
- Seminar: DSP in practice, spectrogram creation, IRF, frequency filtering

- [**week03**](./week03) Speech Recognition I

- Lecture: Metrics, Datasets, Connectionist Temporal Classification (CTC), Classic Models, Beam Search, Language models
- Seminar: Audio Augmentations, Beam Search
- Q&A Session: Homework discussion, R&D coding tips

- [**week04**](./week04) Speech Recognition II

- Lecture: LAS, RNN-T, Language models for RNN-T and LAS
- Seminar: Hybrid RNN-T and CTC model training and inference

- [**week05**](./week05) Guest Lecture. Speech Recognition III and Audio SSL

- Lecture: Self-Supervised Models for Audio, Audio LLMs

- [**week06**](./week06) Source Separation I

- Lecture: A review of general Source Separation and Denoising, Encoder-Decoder-Separator architectures, Demucs family, DCCRN, FullSubNet+, BandSplitRNN
- Seminar: Metrics

- [**week07**](./week07) Source Separation II

- Lecture: Speech separation, Blind and Target Separation, Recurrent(TasNet, DPRNN, VoiceFilter) and CNN(ConvTasNet, SpEx+)
- Seminar: WienerFilter, SincFilter and DEMUCS; streaming processing and performance metrics

# Homeworks and Projects

- [**HW_ASR**](./hw1_asr) Training speech recognition model
- [**Project_AVSS**](./project_avss) Training audio-visual speech separation model

See our [project template](https://github.com/Blinorot/pytorch_project_template).

# Resources

- [Lecture recordings on YouTube (in russian)](https://youtube.com/playlist?list=PLYG3WHDP5CWVRxLjXZbllqIQTWY_QjKmz)

Some of the weeks have English recordings. See the corresponding sub-directories.

# Contributors & course staff

Course materials and teaching (in different years) were delivered by:

- [Maxim Kaledin](https://t.me/XuMuK_MK)
- [Petr Grinberg](https://t.me/Blinorot)
- [Grigory Fedorov](https://t.me/fedorovgv)
- [Aibek Alanov](https://t.me/aibrain)
- [Alexander Markovich (previously)](https://t.me/markovka17)
- [Daniil Ivanov (previously)](https://t.me/the_longest_id_in_the_world)
- [Ilya Lewin (previously)](https://t.me/levensons)
- [Timofey Smirnov (previously)](https://t.me/timothyxp)
- [Alexander Mamaev (previously)](https://t.me/alxmamaev)

# Past Versions

- [2023](https://github.com/markovka17/dla/tree/2023)
- [2022](https://github.com/markovka17/dla/tree/2022)
- [2021](https://github.com/markovka17/dla/tree/2021)
- [2020](https://github.com/markovka17/dla/tree/2020)