Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/huangwl18/modular-rl
[ICML 2020] PyTorch Code for "One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control"
https://github.com/huangwl18/modular-rl
decentralized-control deep-learning emergent-communication generalization graph-neural-networks locomotion message-passing modular-control modularity reinforcement-learning
Last synced: 2 months ago
JSON representation
[ICML 2020] PyTorch Code for "One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control"
- Host: GitHub
- URL: https://github.com/huangwl18/modular-rl
- Owner: huangwl18
- License: other
- Created: 2020-07-09T18:16:48.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-12-27T15:36:25.000Z (about 2 years ago)
- Last Synced: 2024-08-03T22:17:03.090Z (6 months ago)
- Topics: decentralized-control, deep-learning, emergent-communication, generalization, graph-neural-networks, locomotion, message-passing, modular-control, modularity, reinforcement-learning
- Language: Jupyter Notebook
- Homepage: https://huangwl18.github.io/modular-rl/
- Size: 4.26 MB
- Stars: 211
- Watchers: 11
- Forks: 34
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-decision-transformer - Modular-RL
README
## One Policy to Control Them All:
Shared Modular Policies for Agent-Agnostic Control ##
### ICML 2020
#### [[Project Page]](https://huangwl18.github.io/modular-rl/) [[Paper]](https://www.cs.cmu.edu/~dpathak/papers/modular-rl.pdf) [[Demo Video]](https://youtu.be/9YiZZ_8guq8) [[Long Oral Talk]](https://youtu.be/gEeQ0nzalzo)[Wenlong Huang](https://wenlong.page)1, [Igor Mordatch](https://scholar.google.com/citations?user=Vzr1RukAAAAJ&hl=en)2, [Deepak Pathak](https://www.cs.cmu.edu/~dpathak/)3 4
1University of California, Berkeley, 2Google Brain, 3Facebook AI Research, 4Carnegie Mellon University
This is a PyTorch-based implementation of our [Shared Modular Policies](https://huangwl18.github.io/modular-rl/). We take a step beyond the laborious training process of the conventional single-agent RL policy by tackling the possibility of learning general-purpose controllers for diverse robotic systems. Our approach trains a single policy for a wide variety of agents which can then generalize to unseen agent shapes at test-time without any further training.
If you find this work useful in your research, please cite using the following BibTeX:
@inproceedings{huang2020smp,
Author = {Huang, Wenlong and
Mordatch, Igor and Pathak, Deepak},
Title = {One Policy to Control Them All:
Shared Modular Policies for Agent-Agnostic Control},
Booktitle = {ICML},
Year = {2020}
}## Setup
### Requirements
- Python-3.6
- PyTorch-1.1.0
- CUDA-9.0
- CUDNN-7.6
- [MuJoCo-200](https://www.roboti.us/index.html): download binaries, put license file inside, and add path to .bashrc### Setting up repository
```Shell
git clone https://github.com/huangwl18/modular-rl.git
cd modular-rl/
python3.6 -m venv mrEnv
source $PWD/mrEnv/bin/activate
```### Installing Dependencies
```Shell
pip install --upgrade pip
pip install -r requirements.txt
```## Running Code
| Flags and Parameters | Description |
| ------------- | ------------- |
| ``--morphologies `` | Find existing environments matching each keyword for training (e.g. walker, hopper, humanoid, and cheetah; see examples below) |
| ``--custom_xml `` | Path to custom `xml` file for training the modular policy.
When ```` is a file, train with that `xml` morphology only.
When ```` is a directory, train on all `xml` morphologies found in the directory.
| ``--td`` | Enable top-down message passing (pass ``--td --bu`` for both-way message passing) |
| ``--bu`` | Enable bottom-up message passing (pass ``--td --bu`` for both-way message passing) |
| ``--expID `` | Experiment ID for creating saving directory |
| ``--seed `` | (Optional) Seed for Gym, PyTorch and Numpy |
### Train with existing environment
- Train both-way SMP on ``Walker++`` (12 variants of walker):
```Shell
python main.py --expID 001 --td --bu --morphologies walker
```
- Train both-way SMP on ``Humanoid++`` (8 variants of 2d humanoid):
```Shell
python main.py --expID 002 --td --bu --morphologies humanoid
```
- Train both-way SMP on ``Cheetah++`` (15 variants of cheetah):
```Shell
python main.py --expID 003 --td --bu --morphologies cheetah
```
- Train both-way SMP on ``Hopper++`` (3 variants of hopper):
```Shell
python main.py --expID 004 --td --bu --morphologies hopper
```
- To train both-way SMP for only one environment (e.g. ``walker_7_main``), specify the full name of the environment without the ``.xml`` suffix:
```Shell
python main.py --expID 005 --td --bu --morphologies walker_7_main
```
To run with one-way message passing, disable ``--td`` for bottom-up-only message passing or disable ``--bu`` for top-down-only message passing.
To run without any message passing, disable both ``--td`` and ``--bu``.### Train with custom environment
- Train both-way SMP for only one environment:
```Shell
python main.py --expID 006 --td --bu --custom_xml
```
- Train both-way SMP for multiple environments (``xml`` files must be in the same directory):
```Shell
python main.py --expID 007 --td --bu --custom_xml
```
Note that the current implementation assumes all custom MuJoCo agents are 2D planar and contain only one ``body`` tag with name ``torso`` attached to ``worldbody``.### Visualization
- To visualize all ``walker`` environments with the both-way SMP model from experiment ``expID 001``:
```Shell
python visualize.py --expID 001 --td --bu --morphologies walker
```
- To visualize only ``walker_7_main`` environment with the both-way SMP model from experiment ``expID 001``:
```Shell
python visualize.py --expID 001 --td --bu --morphologies walker_7_main
```## Provided Environments
Walker
walker_2_main
walker_3_main
walker_4_main
walker_5_main
walker_6_main
walker_7_main
walker_2_flipped
walker_3_flipped
walker_4_flipped
walker_5_flipped
walker_6_flipped
walker_7_flipped
2D Humanoid
humanoid_2d_7_left_arm
humanoid_2d_7_left_leg
humanoid_2d_7_lower_arms
humanoid_2d_7_right_arm
humanoid_2d_7_right_leg
humanoid_2d_8_left_knee
humanoid_2d_8_right_knee
humanoid_2d_9_full
Cheetah
cheetah_2_back
cheetah_2_front
cheetah_3_back
cheetah_3_balanced
cheetah_3_front
cheetah_4_allback
cheetah_4_allfront
cheetah_4_back
cheetah_4_front
cheetah_5_back
cheetah_5_balanced
cheetah_5_front
cheetah_6_back
cheetah_6_front
cheetah_7_full
Hopper
hopper_3
hopper_4
hopper_5
Note that each walker agent has an identical instance of itself called ``flipped``, for which SMP always flips the torso message passed to both legs (e.g. the message that is passed to the left leg in the ``main`` instance is now passed the right leg).
For the results reported in the paper, the following agents are in the held-out set for the corresponding experiments:
- Walker++: walker_5_main, walker_6_flipped
- Humanoid++: humanoid_2d_7_right_arm, humanoid_2d_7_lower_arms
- Cheetah++: cheetah_4_front, cheetah_5_balanced, cheetah_6_front
- Walker-Hopper++: walker_5_main, walker_6_flipped, hopper_3
- Walker-Hopper-Humanoid++: walker_5_main, walker_6_flipped, hopper_3, humanoid_2d_7_right_arm, humanoid_2d_7_lower_armsAll other agents in the corresponding experiments are used for training.
## Acknowledgement
The TD3 code is based on this [open-source implementation](https://github.com/sfujim/TD3). The code for Dynamic Graph Neural Networks is adapted from [Modular Assemblies (Pathak*, Lu* et al., NeurIPS 2019)](https://pathak22.github.io/modular-assemblies/).