{"id":13604865,"url":"https://github.com/MachineLearningSystem/ElasticFlow-ASPLOS23","last_synced_at":"2025-04-12T02:32:01.782Z","repository":{"id":185461753,"uuid":"623997945","full_name":"MachineLearningSystem/ElasticFlow-ASPLOS23","owner":"MachineLearningSystem","description":"Artifacts for our ASPLOS'23 paper ElasticFlow","archived":false,"fork":true,"pushed_at":"2023-01-05T09:36:24.000Z","size":3075,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-08-02T19:36:29.625Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"pkusys/ElasticFlow","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MachineLearningSystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-04-05T14:26:43.000Z","updated_at":"2023-04-05T01:03:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MachineLearningSystem/ElasticFlow-ASPLOS23","commit_stats":null,"previous_names":["machinelearningsystem/elasticflow-asplos23"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FElasticFlow-ASPLOS23","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FElasticFlow-ASPLOS23/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FElasticFlow-ASPLOS23/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FElasticFlow-ASPLOS23/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MachineLearningSystem","download_url":"https://codeload.github.com/MachineLearningSystem/ElasticFlow-ASPLOS23/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223489685,"owners_count":17153803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:52.122Z","updated_at":"2024-11-07T09:31:08.694Z","avatar_url":"https://github.com/MachineLearningSystem.png","language":null,"readme":"# ElasticFlow-artifact\n\nWe provide the artifact for the ASPLOS 2023 paper \"ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning\", including:\n\n- The main implementation of ElasticFlow.\n- Cluster simulation scripts (Sec 6.3 \\\u0026 6.4 \\\u0026 6.5), which get the main results of the paper.\n- Testbed experiment scripts (Sec 6.2 \\\u0026 6.6).\n- Figure plotting scripts.\n\n## Simulation Experiments\n\n### General Simulation Experiments\n\nPlease see `ElasticFlow/README.md` for more details. \n\n### Pollux simulation\n\nPlease see `pollux/pollux_simulator/README.md` for more details. \n\n## Testbed Experiments\nNote: Due to the execution scripts of testbed experiments are highly related to internal testbed platform, we only demonstrate the functionality and provide the reproduction steps on the hardware devices we use. Please adjust to your platform if you would like to execute the testbed experiment.\n\nThe testbed experiments require 16 nodes, each with 8 A100 GPUs, 96 CPU cores, 900 GB RAM, and eight NVIDIA Mellanox HDR InfiniBand HCAs. \nYou may use the Azure Standard_ND96asr_A100 VMs for reproduction.\n\n### General Testbed Experiments\nPlease see `ElasticFlow/README.md` for more details.\n\n### Pollux Testbed Experiments\nAs the Pollux baseline is implemented on k8s, we do not interage Pollux in the ElasticFlow system for comparison. We use the open-sourced artifact from the [Pollux repo](https://github.com/petuum/adaptdl/tree/osdi21-artifact) for testbed experiments. \n\nPlease see `pollux/pollux_testbed/README.md` for more details.\n\n## Plotting Figures\nPlease refer to `\u003crepo\u003e/plot_figure/README.md`\n","funding_links":[],"categories":["Paper-Code"],"sub_categories":["Schedule and Resource Management"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FElasticFlow-ASPLOS23","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMachineLearningSystem%2FElasticFlow-ASPLOS23","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FElasticFlow-ASPLOS23/lists"}