{"id":13604840,"url":"https://github.com/MachineLearningSystem/Awesome-DL-Scheduling-Papers","last_synced_at":"2025-04-12T02:31:50.268Z","repository":{"id":185461621,"uuid":"498681468","full_name":"MachineLearningSystem/Awesome-DL-Scheduling-Papers","owner":"MachineLearningSystem","description":null,"archived":false,"fork":true,"pushed_at":"2022-05-24T06:32:16.000Z","size":21,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2024-05-23T01:13:12.692Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"S-Lab-System-Group/Awesome-DL-Scheduling-Papers","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MachineLearningSystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-06-01T09:58:16.000Z","updated_at":"2024-04-02T08:35:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MachineLearningSystem/Awesome-DL-Scheduling-Papers","commit_stats":null,"previous_names":["machinelearningsystem/awesome-dl-scheduling-papers"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAwesome-DL-Scheduling-Papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAwesome-DL-Scheduling-Papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAwesome-DL-Scheduling-Papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAwesome-DL-Scheduling-Papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MachineLearningSystem","download_url":"https://codeload.github.com/MachineLearningSystem/Awesome-DL-Scheduling-Papers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223460699,"owners_count":17148759,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:51.832Z","updated_at":"2024-11-07T09:31:06.031Z","avatar_url":"https://github.com/MachineLearningSystem.png","language":null,"readme":"# Awesome-DL-Scheduling-Papers\nA curated list of DL cluster scheduling papers.\n\nPlease feel free to pull requests or open an issue to add papers.\n\n\n## Schedulers for DL training\n| **Scheduler** | **Year** | **Series** | **Paper** | **Objective** | **Heter.** | **Elastic** | **AutoML** | **Code** |\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| Synergy | 2022 | OSDI | [Paper](https://arxiv.org/abs/2110.06073) | ♣  | - | - | - | - |\n| Singularity | 2022 | arxiv | [Paper](https://arxiv.org/abs/2202.07848) | ♠♣♦ | - | ✔ | - | - |\n| GADGET | 2022 | INFOCOM | [Paper](https://arxiv.org/abs/2202.01158) | ♠♣ | - | ✔ | - | [Code](https://zenodo.org/record/5847644#.YishWH8zZhE) |\n| EDL | 2022 | TPDS | [Paper](https://ieeexplore.ieee.org/document/9373916) | ♠♣ | - | ✔ | - | - |\n| Aryl | 2022 | arxiv | [Paper](https://arxiv.org/abs/2202.07896) | ♠♣ | - | ✔ | ✔ | - |\n| AOnline | 2022 | TCC | [Paper](https://ieeexplore.ieee.org/document/9682563) | ♠♣ | - | ✔ | - | - |\n| Ali-MLaaS | 2022 | NSDI | [Paper](https://www.usenix.org/conference/nsdi22/presentation/weng) | ♠♣ | - | - | - | [Code](https://github.com/alibaba/clusterdata/tree/master/cluster-trace-gpu-v2020) |\n| SMD | 2021 | INFOCOM | [Paper](https://arxiv.org/abs/2105.13855) | ♣ | - | - | - | - |\n| SEER | 2021 | SoCC | [Paper](https://dl.acm.org/doi/pdf/10.1145/3472883.3486989) | ▲ | - | ✔ | ✔ | - |\n| RubberBand | 2021 | EuroSys | [Paper](https://dl.acm.org/doi/10.1145/3447786.3456245) |  ♦ | - | ✔ | ✔ | - |\n| POP | 2021 | SOSP | [Paper](https://dl.acm.org/doi/10.1145/3477132.3483588) | ♥♣ | ✔ | - | - | [Code](https://github.com/stanford-futuredata/POP) |\n| Pollux | 2021 | OSDI | [Paper](https://www.usenix.org/conference/osdi21/presentation/qiao) | ♠♣♥ | - | ✔ | ✔ | [Code](https://github.com/petuum/adaptdl) |\n| ONES | 2021 | SC | [Paper](https://dl.acm.org/doi/10.1145/3458817.3480859) | ♠♣ | - | ✔ | - | [Code](https://github.com/kurisusnowdeng/ones_sc21) |\n| Liquid | 2021 | TPDS | [Paper](https://ieeexplore.ieee.org/document/9664375) | ♣  | - | - | - | [Code](https://github.com/PasaLab/Liquid) |\n| Jigsaw | 2021 | DistributedML | [Paper](https://dl.acm.org/doi/10.1145/3488659.3493778) | ♣ | - | - | - | - |\n| Horus | 2021 | TPDS | [Paper](https://ieeexplore.ieee.org/document/9428512) | ♠♣ | - | - | - | - |\n| Hermes | 2021 | Electronics | [Paper](https://www.mdpi.com/2079-9292/10/3/350) | ♣ | - | - | ✔ | - |\n| Helios | 2021 | SC | [Paper](https://dl.acm.org/doi/abs/10.1145/3458817.3476223) | ♣♦ | - | - | - | [Code](https://github.com/S-Lab-System-Group/HeliosArtifact) |\n| DynamoML | 2021 | CLOSER | [Paper](https://www.scitepress.org/Papers/2021/104834/104834.pdf) | ♠♣ | - | ✔ | - | - |\n| Chronus | 2021 | SoCC | [Paper](https://dl.acm.org/doi/abs/10.1145/3472883.3486978) | ✿ | - | - | - | [Code](https://github.com/S-Lab-System-Group/ChronusArtifact/) |\n| Astraea | 2021 | TPDS | [Paper](https://ieeexplore.ieee.org/document/9655467/) | ♥ | - | - | - | [Code](https://github.com/yzs981130/Astraea_Artifacts) |\n| ANDREAS | 2021 | FCloud | [Paper](https://arxiv.org/abs/2105.05080) | ♦ | - | - | - | - |\n| AFS | 2021 | NSDI | [Paper](https://www.usenix.org/conference/nsdi21/presentation/hwang) | ♠♣ | - | ✔ | - | - |\n| $DL^2$ | 2021 | TPDS | [Paper](https://arxiv.org/abs/1909.06040) | ♣ | - | ✔ | - | [Code](https://github.com/pengyanghua/DL2) |\n| Yeung | 2020 | HotCloud | [Paper](https://www.usenix.org/conference/hotcloud20/presentation/yeung) | ♠ | - | - | - | - |\n| Vaibhav et al. | 2020 | MASCOTS | [Paper](https://ieeexplore.ieee.org/abstract/document/9285954) | ♠♣ | - | ✔ | - | - |\n| Themis | 2020 | NSDI | [Paper](https://www.usenix.org/conference/nsdi20/presentation/mahajan) | ♥ | - | - | - | - |\n| SPIN | 2020 | INFOCOM | [Paper](https://ieeexplore.ieee.org/document/9155445/) | ♣ | - | - | - | - |\n| Salus | 2020 | MLSys | [Paper](https://proceedings.mlsys.org/paper/2020/hash/f7177163c833dff4b38fc8d2872f1ec6-Abstract.html) | ♠♣ | - | - | - | [Code](https://github.com/SymbioticLab/Salus) |\n| Parrot | 2020 | TCC | [Paper](https://ieeexplore.ieee.org/document/9269382) | ♣ | - | - | - | - |\n| Non-Intrusive | 2020 | SC | [Paper](https://dl.acm.org/doi/abs/10.5555/3433701.3433820) | ♠♣ | - | ✔ | - | - |\n| MLFS | 2020 | CoNext | [Paper](https://dl.acm.org/doi/10.1145/3386367.3432588) | ♣✿ | - | - | - | [Code](https://github.com/hiddenlayer2020/ML-Job-Scheduler-MLFS) |\n| MLCloudPrice | 2020 | DISPA | [Paper](https://cs.stanford.edu/~matei/papers/2020/dispa_cloud_ml.pdf) |  ♣♦ | - | - | - | [Code](https://github.com/stanford-futuredata/training_on_a_dime) |\n| MARBLE | 2020 | CCGRID | [Paper](https://ieeexplore.ieee.org/document/9407835) | ♠♣ | - | ✔ | - | - |\n| HiveD | 2020 | OSDI | [Paper](https://www.usenix.org/conference/osdi20/presentation/zhao-hanyu) | ♣ | - | - | - | [Code](https://github.com/microsoft/hivedscheduler) |\n| GENIE | 2020 | TPDS | [Paper](https://ieeexplore.ieee.org/document/8778770) | ✿ | - | ✔ | - | - |\n| Gavel | 2020 | OSDI | [Paper](https://www.usenix.org/conference/osdi20/presentation/narayanan-deepak) | ♣♥ | ✔ | - | - | [Code](https://github.com/stanford-futuredata/gavel) |\n| E-LAS | 2020 | ICPP | [Paper](https://dl.acm.org/doi/fullHtml/10.1145/3404397.3404415) | ♣ | - | - | - | - |\n| Elan | 2020 | ICDCS | [Paper](https://ieeexplore.ieee.org/document/9355755) | ♠♣ | - | ✔ | - | - |\n| Co-scheML | 2020 | ACSOS | [Paper](https://ieeexplore.ieee.org/document/9196380) | ♣ | - | - | - | - |\n| CODA | 2020 | ICDCS | [Paper](https://ieeexplore.ieee.org/document/9355823) | ♣ | ✔* | - | - | - |\n| Antman | 2020 | OSDI | [Paper](https://www.usenix.org/system/files/osdi20-xiao.pdf) | ♠♣ | - | ✔ | - | [Code](https://github.com/alibaba/GPU-scheduler-for-deep-learning) |\n| Ada-SRSF | 2020 | arxiv | [Paper](https://arxiv.org/abs/2002.10105) | ♣ | ✔* | - | - | - |\n| $Gandiva_{fair}$ | 2020 | EuroSys | [Paper](https://dl.acm.org/doi/abs/10.1145/3342195.3387555) | ♥♣ | ✔ | - | - | - |\n| Tiresias | 2019 | NSDI | [Paper](https://www.usenix.org/conference/nsdi19/presentation/gu) | ♣ | - | - | - | [Code](https://github.com/SymbioticLab/Tiresias) |\n| Philly | 2019 | ATC | [Paper](https://www.usenix.org/conference/atc19/presentation/jeon) | ♣ | - | - | - | [Code](https://github.com/msr-fiddle/philly-traces) |\n| JPAS | 2019 | JNCA | [Paper](https://www.sciencedirect.com/science/article/abs/pii/S1084804520300643) | ♣▲ | - | - | ✔ | - |\n| Jahani | 2019 | ICCCS | [Paper](https://ieeexplore.ieee.org/document/8888151) | ♦ | ✔ | ✔ | - | - |\n| HyperSched | 2019 | SoCC | [Paper](https://dl.acm.org/doi/10.1145/3357223.3362719) | ✿ ▲ | - | ✔ | ✔ | - |\n| Harmony | 2019 | INFOCOM | [Paper](https://ieeexplore.ieee.org/document/8737460) | ♣ | - | - | - | - |\n| FfDL | 2019 | Middleware | [Paper](https://dl.acm.org/doi/10.1145/3361525.3361538) | ♣ | - | - | - | [Code](https://github.com/IBM/FfDL) |\n| Dragon | 2019 | CLOSER | [Paper](https://pdfs.semanticscholar.org/3075/cf85b9a70092bcafa10757c6ee6f73b75c2e.pdf) | ♠♣ | - | ✔ | - | - |\n| Cynthia | 2019 | ICPP | [Paper](https://dl.acm.org/doi/10.1145/3337821.3337873) | ♦ | - | ✔ | - | - |\n| $Sched^2$ | 2019 | GLOBECOM | [Paper](https://ieeexplore.ieee.org/document/9014110) | ♣ | - | - | - | - |\n| $FC^2$ | 2019 | CC | [Paper](https://link.springer.com/article/10.1007/s10586-019-02912-6) |  ♦ | ✔* | ✔ | - | - |\n| Optimus | 2018 | EuroSys | [Paper](https://i.cs.hku.hk/~cwu/papers/yhpeng-eurosys18.pdf) | ♣  | - | ✔ | - | [Code](https://github.com/pengyanghua/optimus) |\n| OASiS | 2018 | INFOCOM | [Paper](https://ieeexplore.ieee.org/abstract/document/8486422) | ♠♣ | - | ✔ | - | - |\n| Gandiva | 2018 | OSDI | [Paper](https://www.usenix.org/conference/osdi18/presentation/xiao) | ♠♣ | - | ✔ | ✔ | - |\n| Topology-Aware | 2017 | SC | [Paper](https://dl.acm.org/doi/10.1145/3126908.3126933) | ♣ | - | - | - | [Code](https://github.com/HiEST/gpu-topo-aware) |\n| HyperDrive | 2017 | Middleware | [Paper](https://dl.acm.org/doi/10.1145/3135974.3135994) | ♣▲ | - | - | ✔ | - |\n| Dorm | 2017 | SMARTCOMP | [Paper](https://www.computer.org/csdl/proceedings-article/smartcomp/2017/07947053/12OmNAlvHZ3) | ♥ | - | - | - | - |\n\nJCT: ♣ Utilization: ♠ Cost: ♦ Fairness: ♥ DDL: ✿ Accuracy: ▲ \n\n## Schedulers for DL Inference\n| **Scheduler** | **Year** | **Series** | **Paper** | **Objective** | **Batch** | **Share** | **Cloud** | **Source Code** |\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| Cocktail | 2022 | NSDI | [Paper](http://arxiv.org/abs/2106.05345) | ♣♦♥ | - | - | ✔ | - |\n| INFaaS | 2021 | ATC | [Paper](https://www.usenix.org/conference/atc21/presentation/jacobs) | ♦♥♠ | - | ✔ | ✔ | [Code](https://github.com/stanford-mast/INFaaS) |\n| Mendoza et al. | 2021 | EuroMLSys | [Paper](https://dl.acm.org/doi/10.1145/3437984.3458837) | ♦ | - | ✔ | - | - |\n| Morphling | 2021 | SoCC | [Paper](https://dl.acm.org/doi/10.1145/3472883.3486987) | ♥♠ | ✔ | ✔ | ✔ | [Code](https://github.com/kubedl-io/morphling) |\n| Abacus | 2021 | SC | [Paper](https://dl.acm.org/doi/10.1145/3458817.3476143) | ♦♠ | - | ✔ | - | [Code](https://github.com/Raphael-Hao/Abacus) |\n| MIG-SERVING | 2021 | CoRR | [Paper](http://arxiv.org/abs/2109.11067) | ♦♥ | ✔ | ✔ | - | - |\n| GSLICE | 2020 | SoCC | [Paper](https://dl.acm.org/doi/10.1145/3419111.3421284) | ♠✿ | ✔ | ✔ | - | - |\n| Clockwork | 2020 | OSDI | [Paper](https://www.usenix.org/conference/osdi20/presentation/gujarati) | ♦♠ | ✔ | - | - | [Code](https://gitlab.mpi-sws.org/cld/ml/clockwork) |\n| CMS | 2020 | Future Internet | [Paper](https://www.mdpi.com/1999-5903/12/6/102) | ♣✿ | - | - | - | - |\n| Irina | 2020 | APNet | [Paper](https://dl.acm.org/doi/10.1145/3411029.3411035) | ♦♠✿ | ✔ | ✔ | - | - |\n| PERSEUS | 2020 | IC2E | [Paper](https://ieeexplore.ieee.org/document/9096261/) | ♦♥♠ | ✔ | - | ✔ | [Code](https://github.com/cake-lab/perseus) |\n| AutoDeep | 2020 | Infocom | [Paper](https://ieeexplore.ieee.org/document/9155267) | ♦♥♠ | - | ✔ | ✔ | - |\n| DyBatch | 2020 | CCGrid | [Paper](https://ieeexplore.ieee.org/document/9139602) | ♦♠ | ✔ | ✔ | - | - |\n| Inferline | 2020 | SoCC | [Paper](https://dl.acm.org/doi/10.1145/3419111.3421285) | ♦♥ | ✔ | - | ✔ | [Code](https://github.com/simon-mo/inferline-models) |\n| MArk | 2019 | ATC | [Paper](https://www.usenix.org/conference/atc19/presentation/zhang-chengliang) | ♦♥ | ✔ | - | ✔ | [Code](https://github.com/marcoszh/MArk-Project) |\n| Tolerance Tiers | 2019 | ISPASS | [Paper](https://ieeexplore.ieee.org/abstract/document/8695638/) | ♣♦♥ | - | - | ✔ | - |\n| ParM | 2019 | SOSP | [Paper](https://dl.acm.org/doi/10.1145/3341301.3359654) | ♦ | ✔ | - | - | [Code](https://github.com/thesys-lab/parity-models) |\n| Gilman et al. | 2019 | DIDL | [Paper](https://dl.acm.org/doi/10.1145/3366622.3368147) | ♦♠ | - | ✔ | - | - |\n| Nanily | 2019 | HPCC | [Paper](https://ieeexplore.ieee.org/document/8855453) | ♦♠ | ✔ | - | - | - |\n| RRL | 2019 | SC | [Paper](https://dl.acm.org/doi/10.1145/3295500.3356164) | ♦ | ✔ | ✔ | - | [Code](https://github.com/HeyangQin/RRL) |\n| Kube-Knots | 2019 | CLUSTER | [Paper](https://ieeexplore.ieee.org/document/8891040) | ♦✿ | ✔ | ✔ | - | - |\n| TrIMS | 2019 | CLOUD | [Paper](https://ieeexplore.ieee.org/document/8814494) | ♦♠✿ | ✔ | ✔ | ✔ | [Code](https://github.com/rai-project/trims_mxnet) |\n| Ebird | 2019 | ICCD | [Paper](https://ieeexplore.ieee.org/abstract/document/8988602/) | ♦♠✿ | ✔ | ✔ | - | [Code](https://github.com/sjtu-epcc/Ebird) |\n| Rafiki | 2018 | VLDB | [Paper](https://dl.acm.org/doi/10.14778/3282495.3282499) | ♣♦ | ✔ | - | - | [Code](https://github.com/nginyc/rafiki) |\n| Space-Time | 2018 | NIPS | [Paper](http://learningsys.org/nips18/assets/papers/102CameraReadySubmissionGPU_Virtualization%20(8).pdf) | ♠✿ | ✔ | ✔ | - | - |\n| Ease.ml | 2018 | VLDB | [Paper](https://dl.acm.org/doi/10.1145/3187009.3177737) | ♣ | - | - | - | [Code](https://github.com/easeml/automl) |\n| HiveMind | 2018 | NIPS | [Paper](https://www.microsoft.com/en-us/research/publication/accelerating-deep-learning-workloads-through-efficient-multi-model-execution/) | ♠ | ✔ | ✔ | - | - |\n| Clipper | 2017 | NSDI | [Paper](https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw) | ♣♦♠ | ✔ | - | - | [Code](https://github.com/ucbrise/clipper) |\n\nAccuracy: ♣ Throughput: ♠ Latency: ♦ Cost: ♥ Utilization: ✿\n","funding_links":[],"categories":["Paper-Code"],"sub_categories":["GPU Cluster Management"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FAwesome-DL-Scheduling-Papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMachineLearningSystem%2FAwesome-DL-Scheduling-Papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FAwesome-DL-Scheduling-Papers/lists"}