https://github.com/jason-cs18/awesome-dl-development
A collection of deep learning development (notes, courses, papers and tools).
https://github.com/jason-cs18/awesome-dl-development
List: awesome-dl-development
cuda deep-learning pytorch
Last synced: about 1 month ago
JSON representation
A collection of deep learning development (notes, courses, papers and tools).
- Host: GitHub
- URL: https://github.com/jason-cs18/awesome-dl-development
- Owner: Jason-cs18
- License: mit
- Created: 2022-02-16T03:09:13.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-18T02:21:53.000Z (over 1 year ago)
- Last Synced: 2025-01-15T20:12:18.381Z (about 1 month ago)
- Topics: cuda, deep-learning, pytorch
- Language: Jupyter Notebook
- Homepage:
- Size: 40.7 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - awesome-dl-development - A collection of deep learning development (notes, courses, papers and tools). (Other Lists / Julia Lists)
README
# Awesome DL Development
To improve deep learning engineering skills, I collect popular learning resources (courses, papers, books and tools) and update my notes accordingly.
## Contents
- Course
- [Harvard CS197: AI Research Experience (Fall 2022)](https://www.cs197.seas.harvard.edu/) (how to conduct AI research?) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Course/Harvard_CS197/readme.md)
- [CMU 10-414/714: Deep Learning Systems (Fall 2022)](https://dlsyscourse.org/lectures/) (how do DL frameworks work?)
- [Machine Learning Compilation (Fall 2022)](https://mlc.ai/) (how to optimize DL programs?) [TVM](https://tvm.apache.org/)
- [TinyML and Efficient Deep Learning Computing, Fall 2022/2023](https://efficientml.ai/) (how to design efficient DL systems?)
- [Towards AGI: Scaling, Alignment & Emergent Behaviors in Neural Nets (Winter 2023)](https://sites.google.com/view/towards-agi-course/schedule) (recent efforts of AI)
- [UCB CS294 AISys: Machine Learning Systems (Spring 2022)](https://ucbrise.github.io/cs294-ai-sys-sp22/) (recent efforts of AISys)
- Book
- [Dive into Deep Learning (vol. 2)](https://d2l.ai/) (what makes DL work?) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/tree/main/Book/D2L)
- [Understanding Deep Learning (UCL 2023)](https://udlbook.github.io/udlbook/) (review concepts of deep learning)
- [Computer Architectures: An Quantitative Approach (6th edition)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Book/pdf/Computer%20Architecture%20a%20Quantitative%20Approach%206th.pdf) (principles of system design)
- [Computer Systems: A Programmer's Perspective (2nd edition)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Book/pdf/CSAPP_2016.pdf) (a good book to review the main concepts of computer systems)
- [Computer Networking: A Top-Down Approach (7th edition)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Book/pdf/Computer%20Networking%20A%20Top-Down%20Approach%20(7th%20Edition).pdf) (background of networking systems)
- Tool
- DL development
- [Pytorch](https://pytorch.org/)  (a popular DL framework for academics and industry) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/Pytorch/README.md)
- [HuggingFace](https://huggingface.co/)  (a "Github" for machine learning engineers and researchers) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/HuggingFace/README.md)
- [Pytorch Lightning](https://lightning.ai/docs/pytorch/stable/)  (a scalable DL framework for academics and industry) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/Pytorch-Lighning/README.md)
- DL deployment
- [NVIDIA Triton](https://developer.nvidia.com/nvidia-triton-inference-server)  (an open-source inference engine for CPU/GPU)
- [Alibaba MNN](https://github.com/alibaba/MNN)  (an open-source inference engine for mobile devices)
- [NVIDIA TAO](https://developer.nvidia.com/tao-toolkit) (a transfer learning toolkit)
- [NVIDIA TensorRT](https://github.com/NVIDIA/TensorRT) (an official acceleration library maintained by NVIDIA for DNN)
- [OpenAI Triton](https://openai.com/research/triton)  (an open-source Python-like programming language to write highly efficient GPU code without CUDA programming experience) [Notes (in progress)](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Tools/OpenAI_Triton/readme.md)
- Paper (topics related to efficient and reliable AI)
- [Submission notices](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/submission_notices.md)
- Presentation
- AAAI Submission Tips
- Research Proposal Template
- [DL & DLSys basics](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/dl_sys.md)
- [Edge-AI-Paper-List](https://github.com/xumengwei/Edge-AI-Paper-List)
- [Machine Learning at Berkeley Reading List](https://ml.berkeley.edu/reading-list/)
- [A reading list for machine learning systems](https://jeongseob.github.io/readings_mlsys.html)
- [Deep Learning for Generic Object Detection: A Survey (2018)](https://arxiv.org/pdf/1809.02165.pdf)
- [Transformer Models: An Introduction and Catelog (2023)](https://arxiv.org/pdf/2302.07730.pdf)
- [Full Stack Optimization of Transformer Inference: a Survey (2023)](https://arxiv.org/abs/2302.14017)
- [Reliable AI](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/reliable_ai.md)
- Survey
- Continuous learning
- Algorithm
- Experience replay (memory-efficient): buffering a small of samples per task in continual learning. [(ICRA'19) Memory efficient experience replay for streaming learning](https://arxiv.org/abs/1809.05922)
- Backbone freezing (parameter-efficient): freezing backbone or shadow layers during training. [(CVPR'22) Proper Reuse of Image Classification Features Improves Object Detection](https://arxiv.org/abs/2204.00484)
- Delta tuning (parameter-efficient for pre-trained language models): xxx. [(Nature, 2023) Parameter-efficient fine-tuning of large-scale pre-trained language models](https://www.nature.com/articles/s42256-023-00626-4)
- System
- [(NSDI'22) Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers](https://www.microsoft.com/en-us/research/publication/ekya-continuous-learning-of-video-analytics-models-on-edge-compute-servers/)
- [(IEEE IOT 2022) Cost-Efficient Continuous Edge Learning for Artificial Intelligence of Things](https://ieeexplore.ieee.org/document/9511621)
- [(SenSys'22 Workshop) Towards Data-Efficient Continuous Learning for Edge Video Analytics via Smart Caching](https://dl.acm.org/doi/10.1145/3560905.3568430)
- [(NSDI'23) RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics](https://www.usenix.org/conference/nsdi23/presentation/khani#:~:text=RECL%20is%20a%20new%20video-analytics%20framework%20that%20carefully,the%20expert%20model%20given%20any%20video%20frame%20samples.)
- [(VLDB'20) ODIN: Automated drift detection and recovery in video analytics](https://dl.acm.org/doi/10.14778/3407790.3407837)
- [(SIGMOD'22) Camel: Managing Data for Efficient Stream Learning](https://dl.acm.org/doi/10.1145/3514221.3517836)
- [(SIGMOD'22) Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets](https://dl.acm.org/doi/10.1145/3514221.3517846)
- [(SIGMOD'22) FILA: Online Auditing of Machine Learning Model Accuracy under Finite Labelling Budget](https://dl.acm.org/doi/10.1145/3514221.3517904)
- [(ICCV'21) Real-Time Video Inference on Edge Devices via Adaptive Model Streaming](https://github.com/modelstreaming/ams)
- Data quality
- [(SenSys'22) Turbo: Opportunistic Enhancement for Edge Video Analytics](https://jason-cs18.github.io/assets/paper/sensys22turbo.pdf)
- [(TOSN'22) DeepMTD: Moving Target Defense for Deep Visual Sensing against Adversarial Examples](https://dl.acm.org/doi/abs/10.1145/3469032)
- [(SECON'22) Focus! Provisioning Attention-aware Detection for Real-time On-device Video Analytics](https://ieeexplore.ieee.org/abstract/document/9918169)
- [(VLDB'21) Declarative data serving: the future of machine learning inference on the edge](https://dl.acm.org/doi/abs/10.14778/3476249.3476302)
- [(SenSys'22) Enhancing Video Analytics Accuracy via Real-time Automated Camera Parameter Tuning](https://dl.acm.org/doi/abs/10.1145/3560905.3568527)
- Ensemble learning
- Algorithm
- [(AAAI 2023) Towards Inference Efficient Deep Ensemble Learning](https://arxiv.org/pdf/2301.12378.pdf)
- [(NeurIPS'22) Deep Ensembles Work, But Are They Necessary?](https://arxiv.org/pdf/2202.06985.pdf)
- [(ICLR'22) Deep Ensembling with No Overhead of either Training or Testing: The All Round Blessings of Dynamic Sparsity](https://iclr.cc/virtual/2022/poster/6299)
- [(arXiv 2022) SANE: Specialization-Aware Neural Network Ensemble](https://openreview.net/forum?id=pLNLdHrZmcX)
- System
- [(NSDI'22) Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models](https://www.usenix.org/conference/nsdi22/presentation/eisenman)
- [(NSDI'22) Cocktail: A Multidimensional Optimization for Model Serving in Cloud](https://www.usenix.org/conference/nsdi22/presentation/gunasekaran)
- Collaborative inference/learning
- [(InfoCom'23) Cross-Camera Inference on the Constrained Edge](https://libinliu0189.github.io/papers/Polly-infocom23.pdf)
- [(AAAI'23 Oral) Multi-View Domain Adaptive Object Detection in Surveillance Cameras](https://jason-cs18.github.io/assets/paper/MVDAOD_AAAI23_Full.pdf)
- [(TON'22) Scheduling Massive Camera Streams to Optimize Large-Scale Live Video Analytics](https://ieeexplore.ieee.org/abstract/document/9622882)
- [(InfoCom'22) ComAI: Enabling Lightweight, Collaborative Intelligence by Retrofitting Vision DNNs](https://ieeexplore.ieee.org/abstract/document/9796769)
- [(ICDCS'22) Multi-View Scheduling of Onboard Live Video Analytics to Minimize Frame Processing Latency](https://ieeexplore.ieee.org/abstract/document/9912287)
- [(SenSys'21) Vision Paper: Towards Software-Defined Video Analytics with Cross-Camera Collaboration](https://dl.acm.org/doi/abs/10.1145/3485730.3493453)
- [(SenSys'21) Mercury: Efficient On-Device Distributed DNN Training via Stochastic Importance Sampling](https://dl.acm.org/doi/abs/10.1145/3485730.3485930)
- [(SenSys'20) Distream: scaling live video analytics with workload-adaptive distributed edge intelligence](https://dl.acm.org/doi/abs/10.1145/3384419.3430721)
- [(SEC'20 Best Paper Award) Spatula: Efficient cross-camera video analytics on large camera networks](https://www.microsoft.com/en-us/research/uploads/prod/2020/08/sec20spatula.pdf)
- [(SEC'19) Collaborative Learning between Cloud and End Devices: An Empirical Study on Location Prediction](https://jason-cs18.github.io/assets/paper/sec19colla.pdf)
- [Efficient AI](https://github.com/Jason-cs18/Awesome-DL-Development/blob/main/Paper/efficient_ai.md)
- Survey and background
- [Efficient Transformers: A Survey (2018)](https://dl.acm.org/doi/pdf/10.1145/3530811)
- [Efficiency 360: Efficient Vision Transformers (2023)](https://arxiv.org/pdf/2302.08374.pdf)
- Scaling laws of deep neural networks
- Model scaling
- [(CVPR'20) EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070)
- [(CVPR'23) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/pdf/2207.02696.pdf)
- [(ICLR'20) Once for All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791)
- [(ICLR'22) Auto-scaling Vision Transformers without Training](https://arxiv.org/pdf/2202.11921.pdf)
- [(MobiCom'23) AdaptiveNet: Post-deployment Neural ArchitectureAdaptation for Diverse Edge Environments](https://arxiv.org/abs/2303.07129)
- [(CVPR'23) Stitchable Neural Networks](https://arxiv.org/abs/2302.06586)
- [(MobiCom'21) LegoDNN: Block-Grained Scaling of DeepNeural Networks for Mobile Vision](https://github.com/LINC-BIT/legodnn)
- Mixture-of-Expert (MoE)
- [awesome-mixture-of-experts](https://github.com/XueFuzhao/awesome-mixture-of-experts#awesome-mixture-of-experts) 
- [(2022) Task-Specific Expert Pruning for Sparse Mixture-of-Experts](https://arxiv.org/pdf/2206.00277.pdf)
- [(2022) Mixture-of-Experts with Expert Choice Routing](https://arxiv.org/abs/2202.09368)
- [(2022) ST-MOE: DESIGNING STABLE AND TRANSFERABLE SPARSE EXPERT MODELS](https://arxiv.org/pdf/2202.08906.pdf)
- [(2022) Towards Understanding the Mixture-of-Experts Layer in Deep Learning](https://papers.nips.cc/paper_files/paper/2022/file/91edff07232fb1b55a505a9e9f6c0ff3-Paper-Conference.pdf)
- [(ICLR'21 Spotlight) Long-tailed Recognition by Routing Diverse Distribution-Aware Experts](https://openreview.net/forum?id=D9I3drBz4UC)
- [(ECCV'20) Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123500239.pdf)
- [(CVPR'20) Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax](https://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf)
- [(CVPR'20) BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhou_BBN_Bilateral-Branch_Network_With_Cumulative_Learning_for_Long-Tailed_Visual_Recognition_CVPR_2020_paper.pdf)
- DL compilers
- [Awesome Tensor Compilers](https://github.com/merrymercy/awesome-tensor-compilers) 
- [(MobiSys'23) Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices](https://arxiv.org/abs/2206.07446)
- [(MobiCom'22) Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs](https://www.microsoft.com/en-us/research/publication/romou-rapidly-generate-high-performance-tensor-kernels-for-mobile-gpus/)
- Serving (Concurrent DL model executions)
- [(NSDI'23) GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge](https://web.cs.ucla.edu/~harryxu/papers/gemel-nsdi23.pdf)
- [(ATC'22) Tetris: Memory-efficient Serverless Inference through Tensor Sharing](https://www.usenix.org/conference/atc22/presentation/li-jie)
- [(OSDI'22) Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences](https://www.usenix.org/conference/osdi22/presentation/han)
- [(SenSys'22) BlastNet: Exploiting Duo-Blocks for Cross-Processor Real-Time DNN Inference](https://dl.acm.org/doi/10.1145/3560905.3568520)
- [(MobiSys'22) CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices](https://chrisplus.me/assets/pdf/mobisys22-CoDL.pdf)
- [(MobiSys'22) Band: coordinated multi-DNN inference on heterogeneous mobile processors](https://dl.acm.org/doi/abs/10.1145/3498361.3538948)
- [(RTSS'22) Jellyfish: Timely Inference Serving for Dynamic Edge Networks](https://linwang.info/papers/rtss22-jellyfish.pdf)
- [(RTSS'19) Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference](https://ieeexplore.ieee.org/abstract/document/9052147)