{"id":31021521,"url":"https://github.com/sisl/mphrl","last_synced_at":"2025-09-13T11:23:25.806Z","repository":{"id":37597970,"uuid":"169484494","full_name":"sisl/MPHRL","owner":"sisl","description":"Model Primitive Hierarchical Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2022-12-08T02:30:06.000Z","size":98,"stargazers_count":13,"open_issues_count":20,"forks_count":4,"subscribers_count":16,"default_branch":"master","last_synced_at":"2024-03-24T17:10:24.106Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sisl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-06T22:09:20.000Z","updated_at":"2024-03-24T17:10:24.107Z","dependencies_parsed_at":"2023-01-24T07:15:42.161Z","dependency_job_id":null,"html_url":"https://github.com/sisl/MPHRL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sisl/MPHRL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPHRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPHRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPHRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPHRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sisl","download_url":"https://codeload.github.com/sisl/MPHRL/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPHRL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274955950,"owners_count":25380669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-13T02:00:10.085Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-13T11:23:24.363Z","updated_at":"2025-09-13T11:23:25.796Z","avatar_url":"https://github.com/sisl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Model Primitive Hierarchical Lifelong Reinforcement Learning\n\nCode to reproduce experiments from https://arxiv.org/abs/1903.01567\n\n## Usage\n\nSee [`Makefile`](./Makefile). Run the appropriate command as argument to `make` to run the particular experiment.\n\n## Description of Config Options\n`ckpt_path`: the checkpoint to load if `restore_model` is `True`.\n\n`math`: the coupling option, i.e. inclusion of `Pi_k` in posterior target\n\n`stable_old`: given that `math` is `True`, should `Pi_k` be computed using the current policy or an old policy\n\n`l2`: whether the individual posterior probabilities are reweighted using an `l2` norm or a `l1` sum\n\n### 8-P\u0026P\n\n`obstacle`: whether there are two walls in the 8-P\u0026P taskset\n\n`obstacle_height`: given `obstacle` is true, the height of the walls\n\n`repeat`: the one-hot vector has to be repeated a couple times to allow the baseline PPO to learn more quickly; \nthis is the number of repeats\n\n`redundant`: whether the observation contains stage information of both boxes in 8-P\u0026P, \nor just the stage information of the current box\n\n`bring_close`: the maximum allowable distance for successful `reach above` actions\n\n`drop_close`: the maximum allowable distance for successful `dropping` actions\n\n`drop_width`: the maximum allowable distance for successful `carry` actions\n\n`split`: whether to normalize one-hot vectors in 8-P\u0026P\n\n`bounded`: whether to use bounded distance\n\n`dist_bound`: if so, what is the bounded distance\n\n`dist_obv`: whether to include relative distance between objects in 8-P\u0026P observation\n\n`above_target`: the multiplier of box size in 8-P\u0026P for distance above the box during `reach above` actions\n\n`stage_obv`: whether stage is observable in 8-P\u0026P\n\n`manhattan`: whether distance is calculated using manhattan or l2 distance\n\n### 10-Maze and 8-P\u0026P\n\n`soft`: whether the gating controller outputs softmax or hardmax selection\n\n`weighted`: whether MPHRL reweights the cross entropy based on ground truth label\n\n`restore_model`: whether to restore a checkpoint\n\n`always_restore`: whether to restore the checkpoint for every new task in lifelong learning\n\n`oracle_master`: whether to use oracle gating controller\n\n`old_policy`: whether to use old policy in posterior calculation\n\n`enforced`: whether to minibatch optimation of gating controller in target tasks\n\n`reset`: whether to reset gating controller for new tasks\n\n`transfer`: whether to transfer subpolicies for new tasks\n\n`paths`: the lifelong learning taskset configuration\n\n`survival_reward`: reward for the ant to be alive per timestep\n\n`record`: record videos during training\n\n`prior`: has no effect on MPHRL\n\n`num_cpus`: number of actors\n\n`num_cores`: same as `num_cpus`\n\n`bs_per_cpu`: train batch size\n\n`max_num_timesteps`: horizon\n\n`bs_per_core`: batch size per core during parallel operations\n\n`total_ts`: total timesteps per lifelong leraning task\n\n`prev_timestep`: skipped timesteps during checkpoint restore for faster convergence\n\n`num_batches`: number of training batches per task\n\n`mov_avg`: moving average calculation of accuracies, etc. \n\n## Citing\n\nIf you found this useful, consider citing:\n\n```\n@inproceedings{wu2019model,\n  title={Model Primitive Hierarchical Lifelong Reinforcement Learning},\n  author={Wu, Bohan and Gupta, Jayesh K and Kochenderfer, Mykel J},\n  booktitle={Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems},\n  pages={34--42},\n  year={2019},\n  organization={International Foundation for Autonomous Agents and Multiagent Systems}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsisl%2Fmphrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsisl%2Fmphrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsisl%2Fmphrl/lists"}