{"id":17344134,"url":"https://github.com/typhoonzero/paddle-openmpi","last_synced_at":"2025-03-27T09:44:31.902Z","repository":{"id":99429765,"uuid":"87933378","full_name":"typhoonzero/paddle-openmpi","owner":"typhoonzero","description":"Run paddle distributed trainning on openmpi clusters","archived":false,"fork":false,"pushed_at":"2017-04-11T13:11:28.000Z","size":6,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-01T14:45:13.797Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/typhoonzero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-11T12:49:50.000Z","updated_at":"2017-06-05T08:36:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"fc95d630-e06b-4883-b4b5-92803b26de70","html_url":"https://github.com/typhoonzero/paddle-openmpi","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typhoonzero%2Fpaddle-openmpi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typhoonzero%2Fpaddle-openmpi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typhoonzero%2Fpaddle-openmpi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typhoonzero%2Fpaddle-openmpi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/typhoonzero","download_url":"https://codeload.github.com/typhoonzero/paddle-openmpi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245822304,"owners_count":20678165,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T16:24:09.477Z","updated_at":"2025-03-27T09:44:31.860Z","avatar_url":"https://github.com/typhoonzero.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# paddle-openmpi\nRun paddle distributed trainning on openmpi clusters\n\n## Requirements\nThis toy requires a kubernetes cluster to do the below.\n\n## Start a mpi cluster on kubernetes\n```bash\nkubectl create -f head.yaml\nkubectl create -f mpi-nodes.yaml\n# check the pods\nkubectl get po -o wide\n```\n\n## Find out the mpi node ips\n```bash\nkubectl get po -o wide | grep nodes | awk '{print $6}' \u003e machines\n```\n\nThen copy the `machines` file to head node(same as ssh in to head node)\n\n## Run\n\nYou need to ssh into the head node in order to submit a job\n\n```bash\nssh -i \n```\n\nCopy all program to each node:\n\n```bash\ncat machines | xargs -i scp start_mpi_train.sh trainer_config.lr.py dataprovider_bow.py {}:/home/tutorial\n```\n\nPrepare trainning data:\n\n```bash\ncd data\nOUT_DIR=$PWD/input SPLIT_COUNT=3 sh get_data.sh\n# copy splited data to each node:\nscp -r input/0/data [node1]:~\nscp -r input/1/data [node1]:~\nscp -r input/2/data [node1]:~\n```\n\nSubmit the job to mpi cluster:\n\n```bash\n\nmpirun -x PYTHONHOME=/usr/local -hostfile machines -n 3  /home/tutorial/start_mpi_train.sh\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftyphoonzero%2Fpaddle-openmpi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftyphoonzero%2Fpaddle-openmpi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftyphoonzero%2Fpaddle-openmpi/lists"}