https://github.com/Open-X-Humanoid/XR-1

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/Open-X-Humanoid/XR-1
Owner: Open-X-Humanoid
License: other
Created: 2025-12-18T13:27:22.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-12-31T10:40:30.000Z (7 months ago)
Last Synced: 2026-01-02T12:46:55.704Z (7 months ago)
Language: Python
Size: 5.2 MB
Stars: 31
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-robotics-ee-opensource - GitHub

README

# XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations

A versatile and scalable vision-language-action framework: XR-1 supports robust multi-task learning across diverse robot embodiments and environments.

> Shichao Fan^1,\*, Kun Wu^1,\*, Zhengping Che^1,\*,†, Xinhua Wang¹, Di Wu^1,4, Fei Liao¹, Ning Liu¹, Yixue Zhang¹, Zhen Zhao¹, Zhiyuan Xu¹, Meng Li¹, Qingjie Liu³, Shanghang Zhang⁴, Min Wan², Jian Tang^1,✉

¹Beijing Innovation Center of Humanoid Robotics,
²School of Mechanical Engineering and Automation, Beihang University,
³State Key Laboratory of Virtual Reality Technology and Systems, SCSE, Beihang University,
⁴State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University

^*Co-first authors,
^†Project leader,
^✉Corresponding author,

[![arXiv](https://img.shields.io/badge/arXiv-2511.02776-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2511.02776) [![Project Page](https://img.shields.io/badge/Project-Page-green.svg?logo=github&logoColor=white)](https://xr-1-vla.github.io/)

[\[📖 Document\]](#documents) [\[🚀 Quick Start\]](#-quick-start) [\[🤗 Models\]](#-models) [\[🤖 Deployment\]](#-XR-1-Deployment) [\[✅ Performance\]](#-performance-in-real-world) [\[🙋 FAQs\]](#-faqs)

![perform](assets/images/xr1_teaser.png)

## TODO List

- [x] Release pre-training / fine-tuning code for XR-1 series.
- [x] Release pre-trained model, and heterogeneous dataset sample of XR-1 on both HuggingFace and ModelScope.
- [ ] Release real world deloyment sample of XR-1.

## Model Download
- [huggingface](https://huggingface.co/collections/X-Humanoid/xr-1)
- [modelscope](https://modelscope.cn/collections/X-Humanoid/XR-1)

## Documents
This repository is built upon a fork of [Lerobot](https://github.com/huggingface/lerobot). Please note that due to the rapid updates of Lerobot, our implementation is specifically aligned with **Lerobot dataset v2.1**. We have preserved the original directory structure to facilitate further development and integration for the community.

### 🚀 Quick Start
### 📑 Installation

Download our source code:
```bash
git clone https://github.com/Open-X-Humanoid/XR-1.git
cd XR-1
```
Create a virtual environment with Python 3.10 and activate it, e.g. with [`miniconda`](https://docs.anaconda.com/free/miniconda/index.html), then install the dependencies:
```bash
conda create -y -n xr1 python=3.10
conda activate xr1
pip install -e ".[xr1]"
```

### 📑 Dataset Preparation

1. **Format Compatibility**: Since our environment relies on **LeRobot Dataset v2.1**, we recommend using [any4lerobot](https://github.com/Tavish9/any4lerobot/) to convert your data to this standard.
2. **Sample Data**: We provide a heterogeneous dataset sample (including EGO4D and Robot data like TienKung2/UR/Franka) available at [X-Humanoid/XR-1-Dataset-Sample](https://huggingface.co/datasets/X-Humanoid/XR-1-Dataset-Sample). You can download it using the provided script: `scripts/hf_xr1_dataset_sample_download.sh` or `scripts/modelscope_xr1_dataset_sample_download.sh`.
3. **Unified Dataloader**: We have designed a powerful dataloader that unifies heterogeneous data sources and embodiments, making **pre-training** extremely simple. You can find the implementation in `examples/xr1_cross_dataset_and_embodiment_dataloader.py`.

**Key enhancements over the original LeRobot dataloader:**
- **Unified Data Loading**: Seamlessly reads data from diverse sources and embodiments.
- **Multi-Task Support**: Compatible with heterogeneous multi-task learning.
- **Few-Shot Capabilities**: Supports training with small sample sizes.
- **Extensibility**: Easily adaptable to new formats (e.g., non-LeRobot formats like Ego4D) with minimal development.

## 🤗 Models

To set up the model environment, first download the foundation models (e.g., SigLIP, PaliGemma) by running:
```bash
# Huggingface
bash scripts/hf_download.sh
```
Then, to obtain the **XR-1-Stage1-UVMC** and **XR-1-Stage2-Pretrain** models for fine-tuning, run:
```bash
# Huggingface
bash scripts/hf_xr1_pretrain_model_download.sh
# Or ModelScope
bash modelscope_xr1_pretrain_model_download.sh
```

## 📖 Training Recipe

We provide three training paths depending on your data and performance requirements:

### 📑 Fast Fine-tuning (For Quick Deployment)
If you need to quickly adapt the model to a new task or robot, you can fine-tune only the **Stage 3**.
This is the fastest way to get a deployable model:

```bash
# Debug Mode (For testing configurations):
bash scripts/xr1_stage3_finetune.sh --debug
# Standard Training (Default):
bash scripts/xr1_stage3_finetune.sh

```

### 📑 Full Fine-tuning (Recommended for Best Performance)
For custom datasets where you aim for optimal performance, we strongly recommend fine-tuning **all three stages** (Stage 1, 2, and 3) sequentially to better align the representations with your specific data:
```bash
# Full Fine-tuning Stage1 & Stage2 & stage3
bash scripts/xr1_stage1_finetune.sh
bash scripts/xr1_stage2_finetune.sh
bash scripts/xr1_stage3_finetune.sh (optional)
```

### 📑 Pre-training from Scratch
Our framework fully supports pre-training if you have access to large-scale, heterogeneous datasets across diverse embodiments and environments:
```bash
# Pre-training Stage1 & Stage2
bash scripts/xr1_stage1_pretrain.sh
bash scripts/xr1_stage2_pretrain.sh
```

## 🤖 XR-1 Deployment

We provide a streamlined workflow to deploy and verify XR-1 on various robotic platforms, including Franka, UR, and Agilex. The following example demonstrates the process using a Franka dual-arm robot:
``` bash
# 1. Perform Fast Fine-tuning to train a specific Stage 3 model
# Franka
bash scripts/xr1_stage3_finetune.sh --debug --dataset XR_1_DATASET_DUAL_ARM_FRANKA
# Or Tienkung2
bash scripts/xr1_stage3_finetune.sh --debug --dataset XR_1_DATASET_DUAL_ARM_TIEN_KUNG2
# 2. Execute the deployment script
python deploy/real_robot/xr1_deploy.py
```

For deployment on **TienKung 2.0**, we recommend referring to the [x-humanoid-training-toolchain](https://github.com/Open-X-Humanoid/x-humanoid-training-toolchain/tree/main/deployment) for specialized instructions.

## ✅ Performance in Real-world

Dual-Arm UR-5e

Tien Kung 2.0

Tien Kung 1.0

Dual-Arm Franka

AgileX Cobot Magic V2.0

Single-Arm UR-5e

## 🤗 FAQs
If you encounter any issues, feel free to open an issue on GitHub or reach out through discussions. We appreciate your feedback and contributions! 🚀

## License

This project is released under the [Apache License](LICENSE). Parts of this project contain code and models from other sources, which are subject to their respective licenses.

## Citation

If you find this project useful in your research, please consider cite:

```BibTeX
@article{fan2025xr,
title={XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations},
author={Fan, Shichao and Wu, Kun and Che, Zhengping and Wang, Xinhua and Wu, Di and Liao, Fei and Liu, Ning and Zhang, Yixue and Zhao, Zhen and Xu, Zhiyuan and others},
journal={arXiv preprint arXiv:2511.02776},
year={2025}
}
```

## Acknowledgement
XR-1 is built with reference to the code of the following projects: [Lerobot](https://github.com/huggingface/lerobot), [Moto](https://github.com/TencentARC/Moto), [QueST](https://github.com/pairlab/QueST) and [Pi0](https://github.com/Physical-Intelligence/openpi). Thanks for their awesome work!

## Discussions
If you're interested in XR-1, welcome to join our WeChat group for discussions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Open-X-Humanoid/XR-1

Awesome Lists containing this project

README