{"id":13575187,"url":"https://github.com/mryab/efficient-dl-systems","last_synced_at":"2025-05-15T01:07:25.560Z","repository":{"id":42716151,"uuid":"435313878","full_name":"mryab/efficient-dl-systems","owner":"mryab","description":"Efficient Deep Learning Systems course materials (HSE, YSDA)","archived":false,"fork":false,"pushed_at":"2025-04-23T20:32:03.000Z","size":72032,"stargazers_count":831,"open_issues_count":2,"forks_count":132,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-05-15T01:07:19.104Z","etag":null,"topics":["cuda","deep-learning","distributed-training","efficient-deep-learning","machine-learning","ml-infrastructure","mlops","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mryab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-12-06T00:30:40.000Z","updated_at":"2025-05-13T14:23:10.000Z","dependencies_parsed_at":"2024-01-14T04:01:21.568Z","dependency_job_id":"943a603c-9efb-448c-a18d-9159d9f6ece6","html_url":"https://github.com/mryab/efficient-dl-systems","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mryab%2Fefficient-dl-systems","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mryab%2Fefficient-dl-systems/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mryab%2Fefficient-dl-systems/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mryab%2Fefficient-dl-systems/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mryab","download_url":"https://codeload.github.com/mryab/efficient-dl-systems/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254254041,"owners_count":22039792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","deep-learning","distributed-training","efficient-deep-learning","machine-learning","ml-infrastructure","mlops","pytorch"],"created_at":"2024-08-01T15:00:58.872Z","updated_at":"2025-05-15T01:07:20.547Z","avatar_url":"https://github.com/mryab.png","language":"Jupyter Notebook","funding_links":[],"categories":["Tutorials"],"sub_categories":["Tools and Development"],"readme":"# Efficient Deep Learning Systems\nThis repository contains materials for the Efficient Deep Learning Systems course taught at the [Faculty of Computer Science](https://cs.hse.ru/en/) of [HSE University](https://www.hse.ru/en/) and [Yandex School of Data Analysis](https://academy.yandex.com/dataschool/).\n\n__This branch corresponds to the ongoing 2025 course. If you want to see full materials of past years, see the [\"Past versions\"](#past-versions) section.__\n\n# Syllabus\n- [__Week 1:__](./week01_intro) __Introduction__\n  - Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.\n  - Seminar: CUDA operations in PyTorch. Introduction to benchmarking.\n- [__Week 2:__](./week02_management_and_testing) __Experiment tracking, model and data versioning, testing DL code in Python__\n  - Lecture: Experiment management basics and pipeline versioning. Configuring Python applications. Intro to regular and property-based testing.\n  - Seminar: Example DVC+Weights \u0026 Biases project walkthrough. Intro to testing with pytest.\n- [__Week 3:__ ](./week03_fast_pipelines) __Training optimizations, FP16/BF16/FP8 formats, profiling deep learning code__\n  - Lecture: Measuring performance of GPU-accelerated software. Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads. \n  - Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of profiling with py-spy, PyTorch Profiler, Memory Snapshot and Nsight Systems.\n- [__Week 4:__](./week04_data_parallel) __Data-parallel training and All-Reduce__\n  - Lecture: Introduction to distributed training. Data-parallel training of neural networks. All-Reduce and its efficient implementations.\n  - Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.\n- [__Week 5:__](./week05_large_models) __Training large models__\n  - Lecture: Tensor, pipeline, sequence parallelism. Gradient checkpointing, offloading.\n  - Seminar: Gradient checkpointing and tensor parallelism in practice.\n- [__Week 6:__](./week06_fsdp) __Sharded data-parallel training, distributed training optimizations__\n  - Lecture: Fully-sharded data parallel training and its optimizations\n  - Seminar: In-depth overview of FSDP2\n- [__Week 7:__](./week07_application_deployment) __Python web application deployment__\n  - Lecture/Seminar: Building and deployment of production-ready web services. App \u0026 web servers, Docker, Prometheus, API via HTTP and gRPC.\n- [__Week 8:__](./week08_inference_software) __LLM inference optimizations and software__\n  - Lecture: Inference speed metrics. KV caching, batch inference, continuous batching. FlashAttention with its modifications and PagedAttention. Overview of popular LLM serving frameworks.\n  - Seminar: Implementation of KV caching. Basics of the Triton language. Layer fusion in PyTorch and Triton. Liger Kernels. FlashAttention and FlexAttention in practice.\n- [__Week 9:__](./week09_inference_algorithms) __Efficient model inference__\n  - Lecture: Speculative decoding, architecture optimizations, quantization, knowledge distillation\n  - Seminar: Introduction to speculative decoding. Matrix multiplication in Triton for different scenarios.\n- __Week 10:__ Guest lecture\n\n## Grading\nThere will be several home assignments (spread over multiple weeks) on the following topics:\n- Training pipelines and code profiling\n- Distributed and memory-efficient training\n- Deploying and optimizing models for production\n\nThe final grade is a weighted sum of per-assignment grades.\nPlease refer to the course page of your institution for details.\n\n# Staff\n- [Max Ryabinin](https://github.com/mryab)\n- [Just Heuristic](https://github.com/justheuristic)\n- [Yaroslav Zolotarev](https://github.com/Q-c7)\n- [Maksim Abraham](https://github.com/fdrose)\n- [Gregory Leleytner](https://github.com/RunFMe)\n- [Antony Frolov](https://github.com/antony-frolov)\n- [Anton Chigin](https://github.com/achigin)\n- [Alexander Markovich](https://github.com/markovka17)\n- [Roman Gorb](https://github.com/rvg77)\n\n# Past versions\n- [2024](https://github.com/mryab/efficient-dl-systems/tree/2024)\n- [2023](https://github.com/mryab/efficient-dl-systems/tree/2023)\n- [2022](https://github.com/mryab/efficient-dl-systems/tree/2022)\n- [2021](https://github.com/yandexdataschool/dlatscale_draft)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmryab%2Fefficient-dl-systems","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmryab%2Fefficient-dl-systems","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmryab%2Fefficient-dl-systems/lists"}