https://github.com/Refined-Policy-Distillation/RPD
Source code for the Refined Policy Distillation paper.
https://github.com/Refined-Policy-Distillation/RPD
Last synced: 15 days ago
JSON representation
Source code for the Refined Policy Distillation paper.
- Host: GitHub
- URL: https://github.com/Refined-Policy-Distillation/RPD
- Owner: Refined-Policy-Distillation
- License: apache-2.0
- Created: 2025-07-15T13:53:00.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2025-07-15T14:02:54.000Z (12 months ago)
- Last Synced: 2025-07-16T07:17:48.135Z (12 months ago)
- Language: Python
- Homepage: https://refined-policy-distillation.github.io/
- Size: 17.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesomeopd - RPD - Policy-Distillation/RPD?style=for-the-badge&logo=github&logoColor=white&labelColor=181717&color=ffd700" alt="Stars"> | 2025.03 | TUM / Freiburg | [arXiv 2503.05833](https://arxiv.org/abs/2503.05833) · [project](https://refined-policy-distillation.github.io/) | Refined Policy Distillation, VLA (IROS 2026) | (🤖 Agent & Embodied OPD (by application) / 🔁 Iterative Self-Bootstrapping)
README
# Refined Policy Distillation (RPD)
This repo contains the code used in the [RPD Paper](https://refined-policy-distillation.github.io/) to distill and refine a VLA ([Octo](https://octo-models.github.io/) or [OpenVLA](https://openvla.github.io/)) using PPO on the [maniskill3](https://github.com/haosulab/ManiSkill) manipulation tasks.
Also checkout our [paper on arXiv](https://arxiv.org/abs/2503.05833), [openvla weights](https://huggingface.co/Juelg/openvla-7b-finetuned-maniskill) and [octo weights](https://huggingface.co/Juelg/octo-base-1.5-finetuned-maniskill) on hugging face and the [maniskill dataset](https://huggingface.co/datasets/Juelg/RPD-maniskill) in RLDS format that we used to train them.
## Installation
If you clone the repos into folders with different names, these need to be adapted in the following guide.
Create a fresh virtual/conda environment and
```shell
conda create -n rpd python=3.11 # should also work with later python versions
conda activate rpd
git clone https://github.com/Refined-Policy-Distillation/RPD.git
cd RPD
pip install -ve .
```
This should already install all required dependencies.
If you need GPU support for the simulation, install a GPU supported torch version and follow the [installation guidelines](https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html) from maniskill.
Checkout the [agents repo](https://github.com/juelg/agents) for more details on the installation of specific teacher VLAs (Octo and OpenVLA).
## Training
Please note that we use the "human" camera perspective in maniksill which is not out of the box available and needs our custom `HumanCameraWrapper` available in [wrappers.py](https://github.com/juelg/agents/blob/master/src/agents/wrappers.py) in the agents repo.
### Dataset
First, the maniskill dataset needs to be down loaded from [huggingface](https://huggingface.co/datasets/haosulab/ManiSkill_Demonstrations).
Afterwards, you need to generate the camera data by replaying the recorded data in the simulation again.
Note that the [`HumanCameraWrapper`](https://github.com/juelg/agents/blob/master/src/agents/wrappers.py) needs to be added to the replay environment in order to optain the correct RPD views.
More information, on how to replay the data can be found on the [maniskill documentation page](https://maniskill.readthedocs.io/en/latest/user_guide/datasets/replay.html).
We used the following command
```shell
python -m mani_skill.trajectory.replay_trajectory --traj-path {path} --save-traj --target-control-mode pd_ee_delta_pose --obs-mode rgb+depth --num-procs 1 --reward-mode normalized_dense --record-rewards --shader default --use-env-states --max-retry 3
```
where path is `demos/*/rl/trajectory.none.pd_ee_delta_pose.cuda.h5`
The output will be data in hdf5 as described by the [maniskill documentation](https://maniskill.readthedocs.io/en/latest/user_guide/datasets/demos.html).
In order to fine-tune Octo and OpenVLA you need to convert the data to RLDS for which you can use [this tool](https://github.com/kpertsch/rlds_dataset_builder) from Karl Pertsch.
We provide the already converted RLDS dataset [here on huggingface](https://huggingface.co/datasets/Juelg/RPD-maniskill).
You can [download it](https://huggingface.co/docs/hub/datasets-downloading) with git (or the huggingface cli)
```shell
git lfs install
git clone git@Juelg/RPD-maniskill
```
and use a tool such as [dlimp](https://github.com/kvablack/dlimp) to load and visualize it.
### Fine-tuning VLAs
To fine-tune Octo and OpenVLA with this dataset you need to add a new dataset mix containing only that dataset.
We release the fine-tuned checkpoint of [Octo](https://huggingface.co/Juelg/octo-base-1.5-finetuned-maniskill) and [OpenVLA](https://huggingface.co/Juelg/openvla-7b-finetuned-maniskill) on huggingface.
### Train RPD from fine-tuned VLAs
At this stage you should have a conda environment for RPD and for each VLA that you want to distill (checkout [the agents repo](https://github.com/juelg/agents) to install Octo or OpenVLA if you haven't already).
Checkout the [train.py](train.py) python script. It configures all hyperparameters for the RPD PPO training including what foundation model to use. You can also train the baseline PPO by switching `use_rpd=False`.
```shell
python train.py
```
The main code is located in [ppo_rgb_rpd.py](src/rpd/ppo_rgb_rpd.py).
Hint: If you train OpenVLA, you might consider checking its preprocessor.
By default that is running on CPU but can be ported to GPU which speeds up the training process especially if you spawn multiple training instances.
## Citation
If you find RPD useful for your work, please consider citing it:
```
@inproceedings{juelg2025refinedpolicydistillationvla,
title={{Refined Policy Distillation}: {F}rom {VLA} Generalists to {RL} Experts},
author={Tobias Jülg and Wolfram Burgard and Florian Walter},
year={2025},
booktitle={Proc.~of the IEEE/RSJ Int.~Conf.~on Intelligent Robots and Systems (IROS)},
note={Accepted for publication.}
}
```