{"id":13605464,"url":"https://github.com/MachineLearningSystem/Zeus","last_synced_at":"2025-04-12T05:33:22.248Z","repository":{"id":185462006,"uuid":"548788348","full_name":"MachineLearningSystem/Zeus","owner":"MachineLearningSystem","description":"An energy optimization framework for DNN training.","archived":false,"fork":true,"pushed_at":"2022-10-08T23:14:50.000Z","size":12732,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2024-08-02T19:37:41.985Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ml.energy/zeus","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"ml-energy/zeus","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MachineLearningSystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-10-10T07:22:59.000Z","updated_at":"2022-10-07T04:49:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"e36929b9-90b1-4eed-9929-2fa71b84a840","html_url":"https://github.com/MachineLearningSystem/Zeus","commit_stats":null,"previous_names":["machinelearningsystem/zeus"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FZeus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FZeus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FZeus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FZeus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MachineLearningSystem","download_url":"https://codeload.github.com/MachineLearningSystem/Zeus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223497884,"owners_count":17155215,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:58.987Z","updated_at":"2024-11-07T10:30:41.717Z","avatar_url":"https://github.com/MachineLearningSystem.png","language":null,"readme":"\u003cdiv align=\"center\"\u003e\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs/assets/img/logo_dark.svg\"\u003e\n  \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"docs/assets/img/logo_light.svg\"\u003e\n  \u003cimg alt=\"Zeus logo\" width=\"55%\" src=\"docs/assets/img/logo_dark.svg\"\u003e\n\u003c/picture\u003e\n\u003ch1\u003eAn Energy Optimization Framework for DNN Training\u003c/h1\u003e\n\u003c/div\u003e\n\n[![arXiv](https://custom-icon-badges.herokuapp.com/badge/ID-2208.06102-b31b1b.svg?logo=arxiv-white\u0026logoWidth=35)](https://arxiv.org/abs/2208.06102)\n[![Docker Hub](https://img.shields.io/badge/Docker-SymbioticLab%2FZeus-blue.svg?logo=docker\u0026logoColor=white)](https://hub.docker.com/r/symbioticlab/zeus)\n[![Homepage build](https://github.com/SymbioticLab/Zeus/actions/workflows/deploy_homepage.yaml/badge.svg)](https://github.com/SymbioticLab/Zeus/actions/workflows/deploy_homepage.yaml)\n[![Apache-2.0 License](https://custom-icon-badges.herokuapp.com/github/license/SymbioticLab/Zeus?logo=law)](/LICENSE)\n\nZeus automatically optimizes the **energy and time** of training a DNN to a target validation metric by finding the optimal **batch size** and **GPU power limit**.\n\nPlease refer to our [NSDI’23 publication](https://arxiv.org/abs/2208.06102) for details.\nCheckout [Overview](https://ml.energy/zeus/overview/) for a summary.\n\nZeus is part of [The ML.ENERGY Initiative](https://ml.energy).\n\n## Repository Organization\n\n```\n.\n├── zeus/                # ⚡ Zeus Python package\n│   ├── run/             #    - Tools for running Zeus on real training jobs\n│   ├── policy/          #    - Optimization policies and extension interfaces\n│   ├── profile/         #    - Tools for profiling energy and time\n│   ├── simulate.py      #    - Tools for trace-driven simulation\n│   ├── util/            #    - Utility functions and classes\n│   ├── analyze.py       #    - Analysis functions for power logs\n│   └── job.py           #    - Class for job specification\n│\n├── zeus_monitor/        # 🔌 GPU power monitor\n│   ├── zemo/            #    -  A header-only library for querying NVML\n│   └── main.cpp         #    -  Source code of the power monitor\n│\n├── examples/            # 🛠️ Examples of integrating Zeus\n│   ├── capriccio/       #    - Integrating with Huggingface and Capriccio\n│   ├── cifar100/        #    - Integrating with torchvision and CIFAR100\n│   └── trace_driven/    #    - Using the Zeus trace-driven simulator\n│\n├── capriccio/           # 🌊 A drifting sentiment analysis dataset\n│\n└── trace/               # 🗃️ Train and power traces for various GPUs and DNNs\n```\n\n## Getting Started\n\nRefer to [Getting started](https://ml.energy/zeus/getting_started) for complete instructions on environment setup, installation, and integration.\n\n### Docker image\n\nWe provide a Docker image fully equipped with all dependencies and environments.\nThe only command you need is:\n\n```sh\ndocker run -it \\\n    --gpus 1                    `# Mount one GPU` \\\n    --cap-add SYS_ADMIN         `# Needed to change the power limit of the GPU` \\\n    --shm-size 64G              `# PyTorch DataLoader workers need enough shm` \\\n    symbioticlab/zeus:latest \\\n    bash\n```\n\nRefer to [Environment setup](https://ml.energy/zeus/getting_started/environment/) for details.\n\n### Examples\n\nWe provide working examples for integrating and running Zeus:\n\n- Integrating Zeus with Computer Vision\n    - [ImageNet](examples/imagenet)\n    - [CIFAR100](examples/cifar100)\n- [Integrating Zeus with Natural Language Processing](examples/capriccio)\n- [Running trace-driven simulation on single recurring jobs and the Alibaba GPU cluster trace](examples/trace_driven)\n\n\n## Extending Zeus\n\nYou can easily implement custom policies for batch size and power limit optimization and plug it into Zeus.\n\nRefer to [Extending Zeus](https://ml.energy/zeus/extend/) for details.\n\n## Citation\n\nPlease consider citing our NSDI’23 paper if you find Zeus to be related to your research project.\n\n```bibtex\n@inproceedings{zeus-nsdi23,\n    title     = {Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training},\n    author    = {Jie You and Jae-Won Chung and Mosharaf Chowdhury},\n    booktitle = {USENIX NSDI},\n    year      = {2023}\n}\n```\n\n## Contact\nJae-Won Chung (jwnchung@umich.edu)\n","funding_links":[],"categories":["Paper-Code"],"sub_categories":["Energy"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FZeus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMachineLearningSystem%2FZeus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FZeus/lists"}