{"id":13706305,"url":"https://github.com/huangwl18/modular-rl","last_synced_at":"2025-05-05T20:30:45.848Z","repository":{"id":37640757,"uuid":"278438798","full_name":"huangwl18/modular-rl","owner":"huangwl18","description":"[ICML 2020] PyTorch Code for \"One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control\"","archived":false,"fork":false,"pushed_at":"2022-12-27T15:36:25.000Z","size":4469,"stargazers_count":220,"open_issues_count":12,"forks_count":34,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-11-13T14:41:12.491Z","etag":null,"topics":["decentralized-control","deep-learning","emergent-communication","generalization","graph-neural-networks","locomotion","message-passing","modular-control","modularity","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://huangwl18.github.io/modular-rl/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huangwl18.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-09T18:16:48.000Z","updated_at":"2024-11-06T02:54:52.000Z","dependencies_parsed_at":"2023-01-31T04:45:11.319Z","dependency_job_id":null,"html_url":"https://github.com/huangwl18/modular-rl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huangwl18%2Fmodular-rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huangwl18%2Fmodular-rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huangwl18%2Fmodular-rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huangwl18%2Fmodular-rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huangwl18","download_url":"https://codeload.github.com/huangwl18/modular-rl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252570879,"owners_count":21769739,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decentralized-control","deep-learning","emergent-communication","generalization","graph-neural-networks","locomotion","message-passing","modular-control","modularity","reinforcement-learning"],"created_at":"2024-08-02T22:00:53.988Z","updated_at":"2025-05-05T20:30:43.875Z","avatar_url":"https://github.com/huangwl18.png","language":"Jupyter Notebook","readme":"## One Policy to Control Them All:\u003cbr/\u003eShared Modular Policies for Agent-Agnostic Control ##\n### ICML 2020\n#### [[Project Page]](https://huangwl18.github.io/modular-rl/) [[Paper]](https://www.cs.cmu.edu/~dpathak/papers/modular-rl.pdf) [[Demo Video]](https://youtu.be/9YiZZ_8guq8) [[Long Oral Talk]](https://youtu.be/gEeQ0nzalzo)\n\n[Wenlong Huang](https://wenlong.page)\u003csup\u003e1\u003c/sup\u003e, [Igor Mordatch](https://scholar.google.com/citations?user=Vzr1RukAAAAJ\u0026hl=en)\u003csup\u003e2\u003c/sup\u003e, [Deepak Pathak](https://www.cs.cmu.edu/~dpathak/)\u003csup\u003e3 4\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003eUniversity of California, Berkeley, \u003csup\u003e2\u003c/sup\u003eGoogle Brain, \u003csup\u003e3\u003c/sup\u003eFacebook AI Research, \u003csup\u003e4\u003c/sup\u003eCarnegie Mellon University\u003cbr/\u003e\n\n\u003cimg src=\"images/teaser.gif\" width=\"700\"\u003e\n\nThis is a PyTorch-based implementation of our [Shared Modular Policies](https://huangwl18.github.io/modular-rl/). We take a step beyond the laborious training process of the conventional single-agent RL policy by tackling the possibility of learning general-purpose controllers for diverse robotic systems. Our approach trains a single policy for a wide variety of agents which can then generalize to unseen agent shapes at test-time without any further training.\n\nIf you find this work useful in your research, please cite using the following BibTeX:\n\n    @inproceedings{huang2020smp,\n      Author = {Huang, Wenlong and\n      Mordatch, Igor and Pathak, Deepak},\n      Title = {One Policy to Control Them All:\n      Shared Modular Policies for Agent-Agnostic Control},\n      Booktitle = {ICML},\n      Year = {2020}\n      }\n\n## Setup\n### Requirements\n- Python-3.6\n- PyTorch-1.1.0\n- CUDA-9.0\n- CUDNN-7.6\n- [MuJoCo-200](https://www.roboti.us/index.html): download binaries, put license file inside, and add path to .bashrc\n\n### Setting up repository\n  ```Shell\n  git clone https://github.com/huangwl18/modular-rl.git\n  cd modular-rl/\n  python3.6 -m venv mrEnv\n  source $PWD/mrEnv/bin/activate\n  ```\n\n### Installing Dependencies\n  ```Shell\n  pip install --upgrade pip\n  pip install -r requirements.txt\n  ```\n\n## Running Code\n| Flags and Parameters  | Description |\n| ------------- | ------------- |\n| ``--morphologies \u003cList of STRING\u003e``  | Find existing environments matching each keyword for training (e.g. walker, hopper, humanoid, and cheetah; see examples below)  |\n| ``--custom_xml \u003cPATH\u003e``  | Path to custom `xml` file for training the modular policy.\u003cbr\u003e When ``\u003cPATH\u003e`` is a file, train with that `xml` morphology only. \u003cbr\u003e When ``\u003cPATH\u003e`` is a directory, train on all `xml` morphologies found in the directory.\n| ``--td``  | Enable top-down message passing (pass ``--td --bu`` for both-way message passing)  |\n| ``--bu``  | Enable bottom-up message passing (pass ``--td --bu`` for both-way message passing)  |\n| ``--expID \u003cINT\u003e``  | Experiment ID for creating saving directory  |\n  | ``--seed \u003cINT\u003e``  | (Optional) Seed for Gym, PyTorch and Numpy  |\n  \n### Train with existing environment\n- Train both-way SMP on ``Walker++`` (12 variants of walker):\n```Shell\npython main.py --expID 001 --td --bu --morphologies walker\n  ```\n- Train both-way SMP on ``Humanoid++`` (8 variants of 2d humanoid):\n```Shell\npython main.py --expID 002 --td --bu --morphologies humanoid\n  ```\n- Train both-way SMP on ``Cheetah++`` (15 variants of cheetah):\n```Shell\npython main.py --expID 003 --td --bu --morphologies cheetah\n  ```\n- Train both-way SMP on ``Hopper++`` (3 variants of hopper):\n```Shell\npython main.py --expID 004 --td --bu --morphologies hopper\n  ```\n  - To train both-way SMP for only one environment (e.g. ``walker_7_main``), specify the full name of  the environment without the ``.xml`` suffix:\n```Shell\npython main.py --expID 005 --td --bu --morphologies walker_7_main\n  ```\n To run with one-way message passing, disable ``--td`` for bottom-up-only message passing or disable ``--bu`` for top-down-only message passing.\n To run without any message passing, disable both ``--td`` and ``--bu``.\n\n### Train with custom environment\n- Train both-way SMP for only one environment:\n```Shell\npython main.py --expID 006 --td --bu --custom_xml \u003cPATH_TO_XML_FILE\u003e\n  ```\n- Train both-way SMP for multiple environments (``xml`` files must be in the same directory):\n```Shell\npython main.py --expID 007 --td --bu --custom_xml \u003cPATH_TO_XML_DIR\u003e\n  ```\nNote that the current implementation assumes all custom MuJoCo agents are 2D planar and contain only one ``body`` tag with name ``torso`` attached to ``worldbody``.\n\n### Visualization\n- To visualize all ``walker`` environments with the both-way SMP model from experiment ``expID 001``:\n```Shell\npython visualize.py --expID 001 --td --bu --morphologies walker\n```\n- To visualize only ``walker_7_main`` environment with the both-way SMP model from experiment ``expID 001``:\n```Shell\npython visualize.py --expID 001 --td --bu --morphologies walker_7_main\n```\n\n## Provided Environments\n\n\u003ctable\u003e\n    \u003ctbody\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\" colspan=6\u003e\u003cb\u003eWalker\u003c/b\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_2_main.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_2_main\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_3_main.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_3_main\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_4_main.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_4_main\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_5_main.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_5_main\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_6_main.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_6_main\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_7_main.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_7_main\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_2_flipped.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_2_flipped\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_3_flipped.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_3_flipped\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_4_flipped.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_4_flipped\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_5_flipped.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_5_flipped\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_6_flipped.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_6_flipped\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/walker_7_flipped.jpg\" width=\"80\"\u003e\u003cbr\u003ewalker_7_flipped\u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n\u003ctable\u003e\n    \u003ctbody\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\" colspan=4\u003e\u003cb\u003e2D Humanoid\u003c/b\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_7_left_arm.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_7_left_arm\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_7_left_leg.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_7_left_leg\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_7_lower_arms.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_7_lower_arms\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_7_right_arm.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_7_right_arm\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_7_right_leg.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_7_right_leg\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_8_left_knee.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_8_left_knee\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_8_right_knee.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_8_right_knee\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/humanoid_2d_9_full.jpg\" width=\"80\"\u003e\u003cbr\u003ehumanoid_2d_9_full\u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n\u003ctable\u003e\n    \u003ctbody\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\" colspan=5\u003e\u003cb\u003eCheetah\u003c/b\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_2_back.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_2_back\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_2_front.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_2_front\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_3_back.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_3_back\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_3_balanced.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_3_balanced\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_3_front.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_3_front\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_4_allback.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_4_allback\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_4_allfront.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_4_allfront\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_4_back.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_4_back\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_4_front.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_4_front\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_5_back.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_5_back\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_5_balanced.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_5_balanced\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_5_front.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_5_front\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_6_back.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_6_back\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_6_front.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_6_front\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/cheetah_7_full.jpg\" width=\"80\"\u003e\u003cbr\u003echeetah_7_full\u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003ctable\u003e\n    \u003ctbody\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\" colspan=3\u003e\u003cb\u003eHopper\u003c/b\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/hopper_3.jpg\" width=\"80\"\u003e\u003cbr\u003ehopper_3\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/hopper_4.jpg\" width=\"80\"\u003e\u003cbr\u003ehopper_4\u003c/td\u003e\n            \u003ctd align=\"center\" style=\"text-align:center\"\u003e\u003cimg src=\"images/all-envs-jpg/hopper_5.jpg\" width=\"80\"\u003e\u003cbr\u003ehopper_5\u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\nNote that each walker agent has an identical instance of itself called ``flipped``, for which SMP always flips the torso message passed to both legs (e.g. the message that is passed to the left leg in the ``main`` instance is now passed the right leg).\n\nFor the results reported in the paper, the following agents are in the held-out set for the corresponding experiments:\n\n- Walker++: walker_5_main, walker_6_flipped\n- Humanoid++: humanoid_2d_7_right_arm, humanoid_2d_7_lower_arms\n- Cheetah++: cheetah_4_front, cheetah_5_balanced, cheetah_6_front\n- Walker-Hopper++: walker_5_main, walker_6_flipped, hopper_3\n- Walker-Hopper-Humanoid++: walker_5_main, walker_6_flipped, hopper_3, humanoid_2d_7_right_arm, humanoid_2d_7_lower_arms\n\nAll other agents in the corresponding experiments are used for training.\n\n## Acknowledgement\nThe TD3 code is based on this [open-source implementation](https://github.com/sfujim/TD3). The code for Dynamic Graph Neural Networks is adapted from [Modular Assemblies (Pathak*, Lu* et al., NeurIPS 2019)](https://pathak22.github.io/modular-assemblies/).\n\n","funding_links":[],"categories":["Papers","Embodiment Co-design \u0026 Unified Control"],"sub_categories":["ICML 2022"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuangwl18%2Fmodular-rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuangwl18%2Fmodular-rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuangwl18%2Fmodular-rl/lists"}