https://github.com/sisl/mphrl
Model Primitive Hierarchical Reinforcement Learning
https://github.com/sisl/mphrl
Last synced: 9 months ago
JSON representation
Model Primitive Hierarchical Reinforcement Learning
- Host: GitHub
- URL: https://github.com/sisl/mphrl
- Owner: sisl
- License: mit
- Created: 2019-02-06T22:09:20.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T02:30:06.000Z (over 3 years ago)
- Last Synced: 2024-03-24T17:10:24.106Z (about 2 years ago)
- Language: Python
- Size: 95.7 KB
- Stars: 13
- Watchers: 16
- Forks: 4
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Model Primitive Hierarchical Lifelong Reinforcement Learning
Code to reproduce experiments from https://arxiv.org/abs/1903.01567
## Usage
See [`Makefile`](./Makefile). Run the appropriate command as argument to `make` to run the particular experiment.
## Description of Config Options
`ckpt_path`: the checkpoint to load if `restore_model` is `True`.
`math`: the coupling option, i.e. inclusion of `Pi_k` in posterior target
`stable_old`: given that `math` is `True`, should `Pi_k` be computed using the current policy or an old policy
`l2`: whether the individual posterior probabilities are reweighted using an `l2` norm or a `l1` sum
### 8-P&P
`obstacle`: whether there are two walls in the 8-P&P taskset
`obstacle_height`: given `obstacle` is true, the height of the walls
`repeat`: the one-hot vector has to be repeated a couple times to allow the baseline PPO to learn more quickly;
this is the number of repeats
`redundant`: whether the observation contains stage information of both boxes in 8-P&P,
or just the stage information of the current box
`bring_close`: the maximum allowable distance for successful `reach above` actions
`drop_close`: the maximum allowable distance for successful `dropping` actions
`drop_width`: the maximum allowable distance for successful `carry` actions
`split`: whether to normalize one-hot vectors in 8-P&P
`bounded`: whether to use bounded distance
`dist_bound`: if so, what is the bounded distance
`dist_obv`: whether to include relative distance between objects in 8-P&P observation
`above_target`: the multiplier of box size in 8-P&P for distance above the box during `reach above` actions
`stage_obv`: whether stage is observable in 8-P&P
`manhattan`: whether distance is calculated using manhattan or l2 distance
### 10-Maze and 8-P&P
`soft`: whether the gating controller outputs softmax or hardmax selection
`weighted`: whether MPHRL reweights the cross entropy based on ground truth label
`restore_model`: whether to restore a checkpoint
`always_restore`: whether to restore the checkpoint for every new task in lifelong learning
`oracle_master`: whether to use oracle gating controller
`old_policy`: whether to use old policy in posterior calculation
`enforced`: whether to minibatch optimation of gating controller in target tasks
`reset`: whether to reset gating controller for new tasks
`transfer`: whether to transfer subpolicies for new tasks
`paths`: the lifelong learning taskset configuration
`survival_reward`: reward for the ant to be alive per timestep
`record`: record videos during training
`prior`: has no effect on MPHRL
`num_cpus`: number of actors
`num_cores`: same as `num_cpus`
`bs_per_cpu`: train batch size
`max_num_timesteps`: horizon
`bs_per_core`: batch size per core during parallel operations
`total_ts`: total timesteps per lifelong leraning task
`prev_timestep`: skipped timesteps during checkpoint restore for faster convergence
`num_batches`: number of training batches per task
`mov_avg`: moving average calculation of accuracies, etc.
## Citing
If you found this useful, consider citing:
```
@inproceedings{wu2019model,
title={Model Primitive Hierarchical Lifelong Reinforcement Learning},
author={Wu, Bohan and Gupta, Jayesh K and Kochenderfer, Mykel J},
booktitle={Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems},
pages={34--42},
year={2019},
organization={International Foundation for Autonomous Agents and Multiagent Systems}
}
```