{"id":50510416,"url":"https://github.com/Refined-Policy-Distillation/RPD","last_synced_at":"2026-06-19T14:00:36.498Z","repository":{"id":304839692,"uuid":"1020202521","full_name":"Refined-Policy-Distillation/RPD","owner":"Refined-Policy-Distillation","description":"Source code for the Refined Policy Distillation paper.","archived":false,"fork":false,"pushed_at":"2025-07-15T14:02:54.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-16T07:17:48.135Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://refined-policy-distillation.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Refined-Policy-Distillation.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-15T13:53:00.000Z","updated_at":"2025-07-15T14:02:57.000Z","dependencies_parsed_at":"2025-07-16T11:12:08.602Z","dependency_job_id":"88c3eb04-7e4f-46a4-9dbb-bd4d634ed4f2","html_url":"https://github.com/Refined-Policy-Distillation/RPD","commit_stats":null,"previous_names":["refined-policy-distillation/rpd"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Refined-Policy-Distillation/RPD","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Refined-Policy-Distillation%2FRPD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Refined-Policy-Distillation%2FRPD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Refined-Policy-Distillation%2FRPD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Refined-Policy-Distillation%2FRPD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Refined-Policy-Distillation","download_url":"https://codeload.github.com/Refined-Policy-Distillation/RPD/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Refined-Policy-Distillation%2FRPD/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34534278,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-02T20:00:26.252Z","updated_at":"2026-06-19T14:00:36.492Z","avatar_url":"https://github.com/Refined-Policy-Distillation.png","language":"Python","funding_links":[],"categories":["🤖 Agent \u0026 Embodied OPD (by application)"],"sub_categories":["🔁 Iterative Self-Bootstrapping"],"readme":"# Refined Policy Distillation (RPD)\nThis repo contains the code used in the [RPD Paper](https://refined-policy-distillation.github.io/) to distill and refine a VLA ([Octo](https://octo-models.github.io/) or [OpenVLA](https://openvla.github.io/)) using PPO on the [maniskill3](https://github.com/haosulab/ManiSkill) manipulation tasks.\n\nAlso checkout our [paper on arXiv](https://arxiv.org/abs/2503.05833), [openvla weights](https://huggingface.co/Juelg/openvla-7b-finetuned-maniskill) and [octo weights](https://huggingface.co/Juelg/octo-base-1.5-finetuned-maniskill) on hugging face and the [maniskill dataset](https://huggingface.co/datasets/Juelg/RPD-maniskill) in RLDS format that we used to train them.\n\n## Installation\nIf you clone the repos into folders with different names, these need to be adapted in the following guide.\n\nCreate a fresh virtual/conda environment and\n```shell\nconda create -n rpd python=3.11 # should also work with later python versions\nconda activate rpd\ngit clone https://github.com/Refined-Policy-Distillation/RPD.git\ncd RPD\npip install -ve .\n```\nThis should already install all required dependencies.\nIf you need GPU support for the simulation, install a GPU supported torch version and follow the [installation guidelines](https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html) from maniskill.\n\nCheckout the [agents repo](https://github.com/juelg/agents) for more details on the installation of specific teacher VLAs (Octo and OpenVLA).\n\n## Training\nPlease note that we use the \"human\" camera perspective in maniksill which is not out of the box available and needs our custom `HumanCameraWrapper` available in [wrappers.py](https://github.com/juelg/agents/blob/master/src/agents/wrappers.py) in the agents repo.\n\n### Dataset\nFirst, the maniskill dataset needs to be down loaded from [huggingface](https://huggingface.co/datasets/haosulab/ManiSkill_Demonstrations).\nAfterwards, you need to generate the camera data by replaying the recorded data in the simulation again.\nNote that the [`HumanCameraWrapper`](https://github.com/juelg/agents/blob/master/src/agents/wrappers.py) needs to be added to the replay environment in order to optain the correct RPD views.\nMore information, on how to replay the data can be found on the [maniskill documentation page](https://maniskill.readthedocs.io/en/latest/user_guide/datasets/replay.html).\nWe used the following command\n```shell\npython -m mani_skill.trajectory.replay_trajectory  --traj-path {path} --save-traj --target-control-mode pd_ee_delta_pose --obs-mode rgb+depth --num-procs 1 --reward-mode normalized_dense --record-rewards --shader default --use-env-states --max-retry 3\n```\nwhere path is `demos/*/rl/trajectory.none.pd_ee_delta_pose.cuda.h5`\n\nThe output will be data in hdf5 as described by the [maniskill documentation](https://maniskill.readthedocs.io/en/latest/user_guide/datasets/demos.html).\nIn order to fine-tune Octo and OpenVLA you need to convert the data to RLDS for which you can use [this tool](https://github.com/kpertsch/rlds_dataset_builder) from Karl Pertsch.\nWe provide the already converted RLDS dataset [here on huggingface](https://huggingface.co/datasets/Juelg/RPD-maniskill).\n\nYou can [download it](https://huggingface.co/docs/hub/datasets-downloading) with git (or the huggingface cli)\n```shell\ngit lfs install\ngit clone git@Juelg/RPD-maniskill\n```\nand use a tool such as [dlimp](https://github.com/kvablack/dlimp) to load and visualize it.\n\n### Fine-tuning VLAs\nTo fine-tune Octo and OpenVLA with this dataset you need to add a new dataset mix containing only that dataset.\n\nWe release the fine-tuned checkpoint of [Octo](https://huggingface.co/Juelg/octo-base-1.5-finetuned-maniskill) and [OpenVLA](https://huggingface.co/Juelg/openvla-7b-finetuned-maniskill) on huggingface.\n\n### Train RPD from fine-tuned VLAs\nAt this stage you should have a conda environment for RPD and for each VLA that you want to distill (checkout [the agents repo](https://github.com/juelg/agents) to install Octo or OpenVLA if you haven't already).\n\nCheckout the [train.py](train.py) python script. It configures all hyperparameters for the RPD PPO training including what foundation model to use. You can also train the baseline PPO by switching `use_rpd=False`.\n```shell\npython train.py\n```\nThe main code is located in [ppo_rgb_rpd.py](src/rpd/ppo_rgb_rpd.py).\n\nHint: If you train OpenVLA, you might consider checking its preprocessor.\nBy default that is running on CPU but can be ported to GPU which speeds up the training process especially if you spawn multiple training instances.\n\n## Citation\nIf you find RPD useful for your work, please consider citing it:\n```\n@inproceedings{juelg2025refinedpolicydistillationvla,\n    title={{Refined Policy Distillation}: {F}rom {VLA} Generalists to {RL} Experts}, \n    author={Tobias Jülg and Wolfram Burgard and Florian Walter},\n    year={2025},\n    booktitle={Proc.~of the IEEE/RSJ Int.~Conf.~on Intelligent Robots and Systems (IROS)},\n    note={Accepted for publication.}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRefined-Policy-Distillation%2FRPD","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRefined-Policy-Distillation%2FRPD","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRefined-Policy-Distillation%2FRPD/lists"}