https://github.com/jason-cs18/awesome-dl-development

A collection of deep learning development (notes, courses, papers and tools).
https://github.com/jason-cs18/awesome-dl-development
List: awesome-dl-development
cuda deep-learning pytorch
Last synced: 3 months ago
JSON representation
A collection of deep learning development (notes, courses, papers and tools).
Host: GitHub
URL: https://github.com/jason-cs18/awesome-dl-development
Owner: Jason-cs18
License: mit
Created: 2022-02-16T03:09:13.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-07-18T02:21:53.000Z (almost 2 years ago)
Last Synced: 2025-03-12T18:01:48.221Z (3 months ago)
Topics: cuda, deep-learning, pytorch
Language: Jupyter Notebook
Homepage:
Size: 40.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-dl-development - A collection of deep learning development (notes, courses, papers and tools). (Other Lists / Julia Lists)
README

        # Awesome DL Development

To improve deep learning engineering skills, I collect popular learning resources (courses, papers, books and tools) and update my notes accordingly.

## Contents

- Course

  - [Harvard CS197: AI Research Experience (Fall 2022)](https://www.cs197.seas.harvard.edu/) (how to conduct AI research?) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Course/Harvard_CS197/readme.md)

  - [CMU 10-414/714: Deep Learning Systems (Fall 2022)](https://dlsyscourse.org/lectures/) (how do DL frameworks work?)

  - [Machine Learning Compilation (Fall 2022)](https://mlc.ai/) (how to optimize DL programs?) [TVM](https://tvm.apache.org/)

  - [TinyML and Efficient Deep Learning Computing, Fall 2022/2023](https://efficientml.ai/) (how to design efficient DL systems?)

  - [Towards AGI: Scaling, Alignment & Emergent Behaviors in Neural Nets (Winter 2023)](https://sites.google.com/view/towards-agi-course/schedule) (recent efforts of AI)

  - [UCB CS294 AISys: Machine Learning Systems (Spring 2022)](https://ucbrise.github.io/cs294-ai-sys-sp22/) (recent efforts of AISys)

- Book

  - [Dive into Deep Learning (vol. 2)](https://d2l.ai/) (what makes DL work?) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/tree/main/Book/D2L)

  - [Understanding Deep Learning (UCL 2023)](https://udlbook.github.io/udlbook/) (review concepts of deep learning)

  - [Computer Architectures: An Quantitative Approach (6th edition)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Book/pdf/Computer%20Architecture%20a%20Quantitative%20Approach%206th.pdf) (principles of system design)

  - [Computer Systems: A Programmer's Perspective (2nd edition)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Book/pdf/CSAPP_2016.pdf) (a good book to review the main concepts of computer systems)

  - [Computer Networking: A Top-Down Approach (7th edition)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Book/pdf/Computer%20Networking%20A%20Top-Down%20Approach%20(7th%20Edition).pdf) (background of networking systems)

- Tool 

  - DL development

    - [Pytorch](https://pytorch.org/) ![Github stars](https://img.shields.io/github/stars/pytorch/pytorch) (a popular DL framework for academics and industry) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/Pytorch/README.md)

    - [HuggingFace](https://huggingface.co/) ![Github stars](https://img.shields.io/github/stars/huggingface/transformers) (a "Github" for machine learning engineers and researchers) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/HuggingFace/README.md)

    - [Pytorch Lightning](https://lightning.ai/docs/pytorch/stable/) ![Github stars](https://img.shields.io/github/stars/Lightning-AI/lightning) (a scalable DL framework for academics and industry) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/Pytorch-Lighning/README.md)

  - DL deployment

    - [NVIDIA Triton](https://developer.nvidia.com/nvidia-triton-inference-server) ![Github stars](https://img.shields.io/github/stars/triton-inference-server/server) (an open-source inference engine for CPU/GPU)

    - [Alibaba MNN](https://github.com/alibaba/MNN) ![Github stars](https://img.shields.io/github/stars/alibaba/MNN) (an open-source inference engine for mobile devices)

    - [NVIDIA TAO](https://developer.nvidia.com/tao-toolkit) (a transfer learning toolkit)

    - [NVIDIA TensorRT](https://github.com/NVIDIA/TensorRT) (an official acceleration library maintained by NVIDIA for DNN)

    - [OpenAI Triton](https://openai.com/research/triton) ![Github stars](https://img.shields.io/github/stars/openai/triton)  (an open-source Python-like programming language to write highly efficient GPU code without CUDA programming experience) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/OpenAI_Triton/readme.md)

- Paper (topics related to efficient and reliable AI)

  - [Submission notices](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/submission_notices.md)

    - Presentation

    - AAAI Submission Tips

    - Research Proposal Template

  - [DL & DLSys basics](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/dl_sys.md)

    - [Edge-AI-Paper-List](https://github.com/xumengwei/Edge-AI-Paper-List)

    - [Machine Learning at Berkeley Reading List](https://ml.berkeley.edu/reading-list/)

    - [A reading list for machine learning systems](https://jeongseob.github.io/readings_mlsys.html)

    - [Deep Learning for Generic Object Detection: A Survey (2018)](https://arxiv.org/pdf/1809.02165.pdf)

    - [Transformer Models: An Introduction and Catelog (2023)](https://arxiv.org/pdf/2302.07730.pdf)

    - [Full Stack Optimization of Transformer Inference: a Survey (2023)](https://arxiv.org/abs/2302.14017)

  - [Reliable AI](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/reliable_ai.md)

    - Survey

    - Continuous learning

      - Algorithm

        - Experience replay (memory-efficient): buffering a small of samples per task in continual learning. [(ICRA'19) Memory efficient experience replay for streaming learning](https://arxiv.org/abs/1809.05922)

        - Backbone freezing (parameter-efficient): freezing backbone or shadow layers during training. [(CVPR'22) Proper Reuse of Image Classification Features Improves Object Detection](https://arxiv.org/abs/2204.00484)

        - Delta tuning (parameter-efficient for pre-trained language models): xxx. [(Nature, 2023) Parameter-efficient fine-tuning of large-scale pre-trained language models](https://www.nature.com/articles/s42256-023-00626-4)

      - System

        - [(NSDI'22) Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers](https://www.microsoft.com/en-us/research/publication/ekya-continuous-learning-of-video-analytics-models-on-edge-compute-servers/)

        - [(IEEE IOT 2022) Cost-Efficient Continuous Edge Learning for Artificial Intelligence of Things](https://ieeexplore.ieee.org/document/9511621)

        - [(SenSys'22 Workshop) Towards Data-Efficient Continuous Learning for Edge Video Analytics via Smart Caching](https://dl.acm.org/doi/10.1145/3560905.3568430)

        - [(NSDI'23) RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics](https://www.usenix.org/conference/nsdi23/presentation/khani#:~:text=RECL%20is%20a%20new%20video-analytics%20framework%20that%20carefully,the%20expert%20model%20given%20any%20video%20frame%20samples.)

        - [(VLDB'20) ODIN: Automated drift detection and recovery in video analytics](https://dl.acm.org/doi/10.14778/3407790.3407837)

        - [(SIGMOD'22) Camel: Managing Data for Efficient Stream Learning](https://dl.acm.org/doi/10.1145/3514221.3517836)

        - [(SIGMOD'22) Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets](https://dl.acm.org/doi/10.1145/3514221.3517846)

        - [(SIGMOD'22) FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget](https://dl.acm.org/doi/10.1145/3514221.3517904)

        - [(ICCV'21) Real-Time Video Inference on Edge Devices via Adaptive Model Streaming](https://github.com/modelstreaming/ams)

    - Data quality

        - [(SenSys'22) Turbo: Opportunistic Enhancement for Edge Video Analytics](https://jason-cs18.github.io/assets/paper/sensys22turbo.pdf)

        - [(TOSN'22) DeepMTD: Moving Target Defense for Deep Visual Sensing against Adversarial Examples](https://dl.acm.org/doi/abs/10.1145/3469032)

        - [(SECON'22) Focus! Provisioning Attention-aware Detection for Real-time On-device Video Analytics](https://ieeexplore.ieee.org/abstract/document/9918169)

        - [(VLDB'21) Declarative data serving: the future of machine learning inference on the edge](https://dl.acm.org/doi/abs/10.14778/3476249.3476302)

        - [(SenSys'22) Enhancing Video Analytics Accuracy via Real-time Automated Camera Parameter Tuning](https://dl.acm.org/doi/abs/10.1145/3560905.3568527)

    - Ensemble learning

      - Algorithm

        - [(AAAI 2023) Towards Inference Efficient Deep Ensemble Learning](https://arxiv.org/pdf/2301.12378.pdf)

        - [(NeurIPS'22) Deep Ensembles Work, But Are They Necessary?](https://arxiv.org/pdf/2202.06985.pdf)

        - [(ICLR'22) Deep Ensembling with No Overhead of either Training or Testing: The All Round Blessings of Dynamic Sparsity](https://iclr.cc/virtual/2022/poster/6299)

        - [(arXiv 2022) SANE: Specialization-Aware Neural Network Ensemble](https://openreview.net/forum?id=pLNLdHrZmcX)

      - System

        - [(NSDI'22) Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models](https://www.usenix.org/conference/nsdi22/presentation/eisenman)

        - [(NSDI'22) Cocktail: A Multidimensional Optimization for Model Serving in Cloud](https://www.usenix.org/conference/nsdi22/presentation/gunasekaran)

    - Collaborative inference/learning

      - [(InfoCom'23) Cross-Camera Inference on the Constrained Edge](https://libinliu0189.github.io/papers/Polly-infocom23.pdf)

      - [(AAAI'23 Oral) Multi-View Domain Adaptive Object Detection in Surveillance Cameras](https://jason-cs18.github.io/assets/paper/MVDAOD_AAAI23_Full.pdf)

      - [(TON'22) Scheduling Massive Camera Streams to Optimize Large-Scale Live Video Analytics](https://ieeexplore.ieee.org/abstract/document/9622882)

      - [(InfoCom'22) ComAI: Enabling Lightweight, Collaborative Intelligence by Retrofitting Vision DNNs](https://ieeexplore.ieee.org/abstract/document/9796769)

      - [(ICDCS'22) Multi-View Scheduling of Onboard Live Video Analytics to Minimize Frame Processing Latency](https://ieeexplore.ieee.org/abstract/document/9912287)

      - [(SenSys'21) Vision Paper: Towards Software-Defined Video Analytics with Cross-Camera Collaboration](https://dl.acm.org/doi/abs/10.1145/3485730.3493453)

      - [(SenSys'21) Mercury: Efficient On-Device Distributed DNN Training via Stochastic Importance Sampling](https://dl.acm.org/doi/abs/10.1145/3485730.3485930)

      - [(SenSys'20) Distream: scaling live video analytics with workload-adaptive distributed edge intelligence](https://dl.acm.org/doi/abs/10.1145/3384419.3430721)

      - [(SEC'20 Best Paper Award) Spatula: Efficient cross-camera video analytics on large camera networks](https://www.microsoft.com/en-us/research/uploads/prod/2020/08/sec20spatula.pdf)

      - [(SEC'19) Collaborative Learning between Cloud and End Devices: An Empirical Study on Location Prediction](https://jason-cs18.github.io/assets/paper/sec19colla.pdf) 

  - [Efficient AI](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/efficient_ai.md)

    - Survey and background

      - [Efficient Transformers: A Survey (2018)](https://dl.acm.org/doi/pdf/10.1145/3530811)

      - [Efficiency 360: Efficient Vision Transformers (2023)](https://arxiv.org/pdf/2302.08374.pdf)

      -  Scaling laws of deep neural networks

    - Model scaling

      - [(CVPR'20) EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070)

      - [(CVPR'23) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/pdf/2207.02696.pdf)

      - [(ICLR'20) Once for All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791)

      - [(ICLR'22) Auto-scaling Vision Transformers without Training](https://arxiv.org/pdf/2202.11921.pdf)

      - [(MobiCom'23) AdaptiveNet: Post-deployment Neural ArchitectureAdaptation for Diverse Edge Environments](https://arxiv.org/abs/2303.07129)

      - [(CVPR'23) Stitchable Neural Networks](https://arxiv.org/abs/2302.06586)

      - [(MobiCom'21) LegoDNN: Block-Grained Scaling of DeepNeural Networks for Mobile Vision](https://github.com/LINC-BIT/legodnn)

      - Mixture-of-Expert (MoE)

        - [awesome-mixture-of-experts](https://github.com/XueFuzhao/awesome-mixture-of-experts#awesome-mixture-of-experts) ![Github stars](https://img.shields.io/github/stars/XueFuzhao/awesome-mixture-of-experts#awesome-mixture-of-experts)

        - [(2022) Task-Specific Expert Pruning for Sparse Mixture-of-Experts](https://arxiv.org/pdf/2206.00277.pdf)

        - [(2022) Mixture-of-Experts with Expert Choice Routing](https://arxiv.org/abs/2202.09368)

        - [(2022) ST-MOE: DESIGNING STABLE AND TRANSFERABLE SPARSE EXPERT MODELS](https://arxiv.org/pdf/2202.08906.pdf)

        - [(2022) Towards Understanding the Mixture-of-Experts Layer in Deep Learning](https://papers.nips.cc/paper_files/paper/2022/file/91edff07232fb1b55a505a9e9f6c0ff3-Paper-Conference.pdf)

        - [(ICLR'21 Spotlight) Long-tailed Recognition by Routing Diverse Distribution-Aware Experts](https://openreview.net/forum?id=D9I3drBz4UC)

        - [(ECCV'20) Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123500239.pdf)

        - [(CVPR'20) Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax](https://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf)

        - [(CVPR'20) BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhou_BBN_Bilateral-Branch_Network_With_Cumulative_Learning_for_Long-Tailed_Visual_Recognition_CVPR_2020_paper.pdf)

    - DL compilers

      - [Awesome Tensor Compilers](https://github.com/merrymercy/awesome-tensor-compilers) ![Github stars](https://img.shields.io/github/stars/merrymercy/awesome-tensor-compilers)

      - [(MobiSys'23) Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices](https://arxiv.org/abs/2206.07446)

      - [(MobiCom'22) Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs](https://www.microsoft.com/en-us/research/publication/romou-rapidly-generate-high-performance-tensor-kernels-for-mobile-gpus/)

    - Serving (Concurrent DL model executions)

      - [(NSDI'23) GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge](https://web.cs.ucla.edu/~harryxu/papers/gemel-nsdi23.pdf)

      - [(ATC'22) Tetris: Memory-efficient Serverless Inference through Tensor Sharing](https://www.usenix.org/conference/atc22/presentation/li-jie)

      - [(OSDI'22) Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences](https://www.usenix.org/conference/osdi22/presentation/han)

      - [(SenSys'22) BlastNet: Exploiting Duo-Blocks for Cross-Processor Real-Time DNN Inference](https://dl.acm.org/doi/10.1145/3560905.3568520)

      - [(MobiSys'22) CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices](https://chrisplus.me/assets/pdf/mobisys22-CoDL.pdf)

      - [(MobiSys'22) Band: coordinated multi-DNN inference on heterogeneous mobile processors](https://dl.acm.org/doi/abs/10.1145/3498361.3538948)

      - [(RTSS'22) Jellyfish: Timely Inference Serving for Dynamic Edge Networks](https://linwang.info/papers/rtss22-jellyfish.pdf)

      - [(RTSS'19) Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference](https://ieeexplore.ieee.org/abstract/document/9052147)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jason-cs18/awesome-dl-development

Awesome Lists containing this project

README