{"id":13605197,"url":"https://github.com/MachineLearningSystem/baechi","last_synced_at":"2025-04-12T02:33:00.130Z","repository":{"id":185461671,"uuid":"561141975","full_name":"MachineLearningSystem/baechi","owner":"MachineLearningSystem","description":"Baechi (SoCC '20)","archived":false,"fork":true,"pushed_at":"2022-06-06T23:12:03.000Z","size":136,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2024-08-02T19:37:01.331Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"beomyeol/baechi","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MachineLearningSystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-11-03T03:17:23.000Z","updated_at":"2022-11-02T21:36:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MachineLearningSystem/baechi","commit_stats":null,"previous_names":["machinelearningsystem/baechi"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fbaechi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fbaechi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fbaechi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fbaechi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MachineLearningSystem","download_url":"https://codeload.github.com/MachineLearningSystem/baechi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223489755,"owners_count":17153824,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:55.663Z","updated_at":"2024-11-07T09:31:37.561Z","avatar_url":"https://github.com/MachineLearningSystem.png","language":null,"readme":"# Baechi: Fast Device Placement on Machine Learning Graphs (SoCC 2020)\n\n## Install dependencies\n* Install dependencies with Anaconda\n```\n$ conda install -y python=3.6 numpy=1.16 tensorflow-gpu=1.12 bazel=0.20.0 \\\n      networkx future matplotlib cvxopt scikit-learn\n```\n* Mosek\n```\n$ pip install -f https://download.mosek.com/stable/wheel/index.html Mosek==8.1.82\n```\n\nOur code runs [MOSEK](https://www.mosek.com/) as an LP solver for SCT.\nMOSEK provides a free personal academic license.\nYou can request a license at https://www.mosek.com/products/academic-licenses.\nThe license file (`mosek.lic`) should be placed at `$HOME/mosek`.\n\n## Example usage\nThis example generates the placement of 4-layer GNMT v2 with a batch size of 128, a maximum sequence length of 40, and a vocabulary size of 30000.\n\n* Build a Python program to place operators of an ML model.\n```\n$ bazel build :train\n```\n\n* Generate profiles.\n\n```\n$ ./bazel-bin/train \\\n    --costgen \\\n    --cost_path=/tmp/cost.pkl \\\n    --optimizer=adam \\\n    --batch_size=128 \\\n    --model_name=gnmt_v2 \\\n    --vocab_size=30000 \\\n    --max_seq_length=40 \\\n    --rnn_unit_type=lstm \\\n    --rnn_units=512 \\\n    --num_layers=4 \\\n    --encoder_type=gnmt \\\n    --num_gpus=4 \\\n    --residual \\\n    --colocate_grads_with_ops \\\n    --only_forward\n```\nThis generates profiles of the forward pass and stores them at `/tmp/cost.pkl`.\n\n* Generate a communication cost function between GPUs through the linear regression.\n\n```\n$ bazel build //utils:communication_benchmark\n$ ./bazel-bin/utils/communication_benchmark\n```\n\nThis runs a benchmark that transfers tensors between different GPUs for various tensor sizes.\nBy default, the benchmark transfers tensors from `GPU:0` to `GPU:1` with tensor sizes in the range [2\u003csup\u003e0\u003c/sup\u003e, 2\u003csup\u003e29\u003c/sup\u003e].\nAfter the benchmark finishes, it prints out a generated communication cost function that\nshould be given as the `--comm_cost_coeffs` argument value for the placement.\n\nAn example output would be the following.\n```\n...\nCommunication cost function: 0.0001754 x + 134\n```\n\n* Place operators of GNMT v2 and measure average step times.\n\n```\n$ ./bazel-bin/train \\\n    --cost_path=/tmp/cost.pkl \\\n    --optimizer=adam \\\n    --batch_size=128 \\\n    --model_name=gnmt_v2 \\\n    --vocab_size=30000 \\\n    --max_seq_length=40 \\\n    --rnn_unit_type=lstm \\\n    --rnn_units=512 \\\n    --num_layers=4 \\\n    --encoder_type=gnmt \\\n    --num_gpus=4 \\\n    --residual \\\n    --colocate_grads_with_ops \\\n    --only_forward \\\n    --placement_method=m_etf \\\n    --placer_type=fusion \\\n    --grouper=coplace \\\n    --comm_cost_coeffs=0.0001754,134 \\\n    --memory_fraction=1.0\n```\n\nThis runs the placement of GNMT v2 operators using m-ETF based on the forward operators.\nWhen the placement is done, this measures the average step time of the placement results and prints it out.\n\n## Docker image\n\nA Docker image with all dependencies installed is available.\n\n```\n$ docker pull beomyeol/baechi\n$ docker run -it --rm --gpus all beomyeol/baechi /bin/bash\n```\n\nThis gives you direct access to the container with all GPUs enabled.\nYou can follow the example usage within the container.\n\n## License\nUniversity of Illinois/NCSA Open Source License\n","funding_links":[],"categories":["Paper-Code"],"sub_categories":["Optimization"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2Fbaechi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMachineLearningSystem%2Fbaechi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2Fbaechi/lists"}