{"id":28573368,"url":"https://github.com/opendrivelab/univla","last_synced_at":"2025-06-10T21:17:34.305Z","repository":{"id":292219786,"uuid":"971194396","full_name":"OpenDriveLab/UniVLA","owner":"OpenDriveLab","description":"[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions","archived":false,"fork":false,"pushed_at":"2025-05-31T14:46:53.000Z","size":3083,"stargazers_count":381,"open_issues_count":6,"forks_count":16,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-06-06T20:15:50.501Z","etag":null,"topics":["robot-learning","vla"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2505.06111","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenDriveLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-23T06:48:48.000Z","updated_at":"2025-06-06T13:43:00.000Z","dependencies_parsed_at":"2025-05-08T18:48:44.278Z","dependency_job_id":"6c6b119a-284c-4249-a15e-62f1ece80ece","html_url":"https://github.com/OpenDriveLab/UniVLA","commit_stats":null,"previous_names":["opendrivelab/univla"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FUniVLA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FUniVLA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FUniVLA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FUniVLA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenDriveLab","download_url":"https://codeload.github.com/OpenDriveLab/UniVLA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FUniVLA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259152773,"owners_count":22813223,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["robot-learning","vla"],"created_at":"2025-06-10T21:17:33.521Z","updated_at":"2025-06-10T21:17:34.296Z","avatar_url":"https://github.com/OpenDriveLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# :earth_asia: UniVLA\n\n\n\u003cdiv id=\"top\" align=\"center\"\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/teaser_univla.png\" width=\"1000px\" \u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\u003e #### :page_facing_up: [Paper](https://arxiv.org/pdf/2505.06111) | :rocket: Demo Page (Coming Soon)\n\u003e :black_nib: Qingwen Bu, Y. Yang, J. Cai, S. Gao, G. Ren, M. Yao, P. Luo, H. Li \\\n\u003e :e-mail: Primary Contact: Qingwen Bu (buqingwen@opendrivelab.com)\n\n### :fire: Highlights\n- A recipe towards generalist policy by planning in a unified, embodiment-agnostic action space.\n- A novel approach for extracting task-centric latent actions from cross-embodiment videos.\n- A VLA that achieves state-of-the-art results on multiple benchmarks with compute-efficient training.\n\n## Table of Contents\n- [:movie_camera: Demo](#movie_camera-demo)\n- [:loudspeaker: News](#loudspeaker-news)\n- [:pushpin: TODO List](#pushpin-todo-list)\n- [🤗 Model Zoo](#ckpts)\n- [:video_game: Getting Started](#installation)\n- [:fire: Training Recipe](#fire-training-recipe)\n  - [Task-centric Latent Action Learning](#one-task-centric-latent-action-learning)\n  - [Pretraining of Generalist Policy](#two-pretraining-of-generalist-policy)\n  - [Post-training for Deployment \u0026 Evaluations](#three-post-training-for-deployment--evaluations)\n    - [Real-world Experiment](#mechanical_arm-real-world-experiment)\n    - [LIBERO Benchmark](#1-libero)\n- [:rocket: UniVLA's Performance](#rocket-univlas-performance)\n- [:pencil: Citation](#pencil-citation)\n\n\n  \n## :movie_camera: Demo\nReal-world robot experiments.\n\n\n\u003ctable style=\"width:100%;border-collapse:collapse;\"\u003e\n\u003ctr\u003e\n  \u003ctd style=\"text-align:center;\"\u003e\u003cb\u003eStore the screwdriver (1x speed)\u003c/b\u003e\u003c/td\u003e\n  \u003ctd style=\"text-align:center;\"\u003e\u003cb\u003eClean the cutting board (1x speed)\u003c/b\u003e\u003c/td\u003e\n  \u003ctd style=\"text-align:center;\"\u003e\u003cb\u003eFold towel twice (1x speed)\u003c/b\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n  \u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/b11b4e83-24d8-4b55-b50e-f8271249422c\" style=\"object-fit:cover;\" autoplay loop muted\u003e\u003c/video\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/bafb5bac-8c8e-41d4-89d0-ec774b9b6e1c\" style=\"object-fit:cover;\" autoplay loop muted\u003e\u003c/video\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/6779e0e4-aa6e-4c16-adb9-30dedfd4db85\" style=\"object-fit:cover;\" autoplay loop muted\u003e\u003c/video\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\n  \u003ctd style=\"text-align:center;\"\u003e\u003cb\u003eStack the tower of hanoi (1x speed)\u003c/b\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n  \u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/61f663da-18df-4892-ae8f-5e03aac7469e\" style=\"object-fit:cover;\" autoplay loop muted\u003e\u003c/video\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/da7d7d4e-0634-42d7-8e88-8bb269965b1a\" style=\"object-fit:cover;\" autoplay loop muted\u003e\u003c/video\u003e\u003c/td\u003e\n  \u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/cb3afa9a-ffeb-4879-b915-1803d7ff8262\" style=\"object-fit:cover;\" autoplay loop muted\u003e\u003c/video\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n\n\n## :loudspeaker: News\n\n- **[2025/05]** The code of UniVLA v1.0 is released. Please check it out!\n\n\n\n## :pushpin: TODO list\n\n\n#### 1. 🤗 Checkpoints Release\n  -  [x] 1) Latent action model\n  -  [x] 2) Pre-trained Models\n      - [x] *Full (Manip. + Navi. + Human)*\n      - [x] *BridgeV2-Only*\n      - [x] *Human-Only*\n  -  [x] 3) Downstream Fine-tuned Models\n      - [x] *LIBERO*\n      - [ ] *Room2Room*\n      - [ ] *CALVIN*\n      - [ ] *SimplerEnv*\n#### 2. 💪 Training and Evlauation Codes on Simulation Benchmarks\n  -  [x] **1) LIBERO**\n  -  [ ] **2) Room2Room**\n  -  [ ] **3) CALVIN**\n  -  [ ] **4) SimplerEnv**\n#### 3. :dizzy: Codes and Guidelines for Real-world Deployment\n  -  [x] Codes and Docs\n#### 4. :information_desk_person: Scripts for Pre-processing Human Dataset\n  -  [ ] Codes for converting Ego4D into RLDS format\n\n\n## 🤗 Model Zoo \u003ca name=\"ckpts\"\u003e\u003c/a\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eModel Name\u003c/th\u003e\n    \u003cth\u003eBackbone\u003c/th\u003e\n    \u003cth\u003eHF Path\u003c/th\u003e\n    \u003cth\u003eNote\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003elam-stage-1\u003c/td\u003e\n    \u003ctd\u003e - \u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/qwbu/univla-latent-action-model\"\u003eunivla-latent-action-model\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e The stage-1 latent action model trained on OpenX and Ego4D. \u003c/td\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n    \u003ctr\u003e\n    \u003ctd\u003elam-stage-2\u003c/td\u003e\n    \u003ctd\u003e - \u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/qwbu/univla-latent-action-model\"\u003eunivla-latent-action-model\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e The stage-2 latent action model trained on OpenX and Ego4D. (Generate task-centric latent actions.)\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eunivla-7b\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/TRI-ML/prismatic-vlms/tree/main/prism-dinosiglip-224px%2B7b\"\u003eTRI-ML/prismatic-vlms/prism-dinosiglip-224px+7b\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/qwbu/univla-7b\"\u003eunivla-7b\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003eUniVLA pretrained on our full data collection (Manip. + Navi. + Human). \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eunivla-7b-bridge-pt\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/TRI-ML/prismatic-vlms/tree/main/prism-dinosiglip-224px%2B7b\"\u003eTRI-ML/prismatic-vlms/prism-dinosiglip-224px+7b\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/qwbu/univla-7b-bridge-pt\"\u003eunivla-7b-bridge-pt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003eUniVLA pretrained only on BridgeV2 data.\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eunivla-7b-human-pt\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/TRI-ML/prismatic-vlms/tree/main/prism-dinosiglip-224px%2B7b\"\u003eTRI-ML/prismatic-vlms/prism-dinosiglip-224px+7b\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/qwbu/univla-7b-human-pt\"\u003eunivla-7b-human-pt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003eUniVLA pretrained only on Ego4D human videos. \u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eunivla-7b-224-sft-libero\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/qwbu/univla-7b\"\u003eunivla-7b\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/qwbu/univla-7b-224-sft-libero\"\u003eunivla-7b-224-sft-libero\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003eFinetuned on the LIBERO dataset\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n## :video_game: Getting Started \u003ca name=\"installation\"\u003e\u003c/a\u003e\n\n1. (Optional) We use conda to manage the environment.\n\n```bash\nconda create -n univla python=3.10 -y\nconda activate univla\n```\n\n2. Install dependencies.\n\n```bash\n# Install pytorch\n# Look up https://pytorch.org/get-started/previous-versions/ with your cuda version for a correct command\n# Our experiments are conducted with 'torch 2.2.0 + cuda 12.1'\npip install torch torchvision\n\n# Clone our repo and pip install to download dependencies\ngit clone git@github.com:OpenDriveLab/UniVLA.git\ncd univla\npip install -e .\n\n# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)\npip install packaging ninja\nninja --version; echo $?  # Verify Ninja --\u003e should return exit code \"0\"\npip install \"flash-attn==2.5.5\" --no-build-isolation\n```\n\n## :fire: Training Recipe\n\n### :one: Task-centric Latent Action Learning\n\u003e We hightly recommond directly using our pre-trained latent action model ckeckpoints to save your time and compute.\n\n\u003e [!NOTE]\n\u003e Our latent action model is trained on a comprehensive data collection, encompassing multiple robotic manipulation and navigation datasets from Open X-Embodiment, along with a curated subset of the Ego4D dataset (detailed data construction procedures are provided in the appendix of our [paper](https://www.roboticsproceedings.org/rss21/p014.pdf)).\n\u003e\n\u003e To adapt the model to additional datasets or custom data sources, users may refer to ```./prismatic/vla/datasets/rlds/oxe/mixtures.py``` to either utilize predefined data mixtures or define new ones. Subsequently, the ```data_mix``` parameter in the [configuration file](https://github.com/OpenDriveLab/UniVLA/blob/aab94fdf98221a19c0c9a114c921f069ed449265/latent_action_model/config/lam-stage-1.yaml#L27) should be updated accordingly.\n\n\n\nThe latent action model is implemented based on [VQ-VAE](https://arxiv.org/abs/1711.00937).\nWe train the latent action model on the collection of dataset comprising robot manipulation, navigation and human videos. In stage-1 training, we use an overall batch size of 512 and 100k optimization steps to construct the task-irrelevant latent actions:\n\n```bash\ntorchrun --standalone --nnodes 1 --nproc-per-node 8 main.py fit \\\n    --config config/lam-stage-1.yaml \\\n    2\u003e\u00261 | tee lam-stage-1.log\n```\n\nThe following stage-2 then focuses on learning task-centric latent actions on the basis of stage-1 results. Please modify the ```stage_one_ckpt``` in ```latent_action_model/config/lam-stage-2.yaml``` to your local path of stage-1 checkpoint, then run training with:\n\n```bash\ntorchrun --standalone --nnodes 1 --nproc-per-node 8 main.py fit \\\n    --config config/lam-stage-2.yaml \\\n    2\u003e\u00261 | tee lam-stage-2.log\n```\n\n\n### :two: Pretraining of Generalist Policy\n\n- **Latent Action Pseudo-Labeling for Policy Optimization:** The trained latent action model is employed to generate pseudo-labels for policy optimization via a next-token prediction objective. Specifically, the indices of inferred latent actions in the VQ-VAE codebook are mapped to dedicated tokens in the LLaMA tokenizer, denoted as ```{ACT_0, ACT_1, ..., ACT_C}```.\n\n- **Cost-effective Pre-Training:** The full-scale pre-training procedure, incorporating both OpenX and Ego4D datasets, was performed using a 32-GPU A100 cluster over 20,000 optimization steps. This training regimen required approximately 960 A100 GPU-hours, representing just 5% of the computational resources utilized by OpenVLA. Furthermore, experiments conducted on the 'Bridge' and 'Human' subsets demanded only 200 GPU-hours, demonstrating substantially reduced computational requirements compared to previous vision-language-action models.\n\n\n- To initiate pre-training, please refer to the following scipt or simply run ```bash ./vla-scripts/train.sh```:\n\n\u003e [!NOTE]\n\u003e For pretraining UniVLA only on BridgeV2 or Human (Ego4D) data, please modify ```vla.type``` to ```prism-dinosiglip-224px+mx-bridge(human)``` correspondingly. Detailed setups can be found in ```./prismatic/conf/vla.py```.\n\n```bash\n### Experiment on a 32-GPU cluster\nGPUS_PER_NODE=8  \nNNODES=4\nMASTER_PORT=${MASTER_PORT:-28596}\nMASTER_ADDR=${MASTER_ADDR:-\"127.0.0.1\"}\nRANK=${RANK:-0}\n\n# Run your training script with torchrun\ntorchrun --nproc_per_node ${GPUS_PER_NODE} --nnodes ${NNODES} --node_rank ${RANK} --master_addr ${MASTER_ADDR} --master_port ${MASTER_PORT} train.py \\\n                                 --vla.type prism-dinosiglip-224px+mx-oxe-magic-soup-plus \\\n                                 --run_root_dir \"vla_log\" \\\n```\n                                 \n\n### :three: Post-training for Deployment \u0026 Evaluations\n\n- With the pretrained generalist policy trained to plan over an embodiment-agnostic action space, we then add embodiment-specific action decoder heads for downstream deployment.\n- Our action decoder is extremely lightwight with only around 12M parameters. Using parameter efficient fine-tuning with LoRA rank 32, the total trainable parameter is around 123M.\n\n#### :mechanical_arm: Real-world Experiment\n\n\u003e Our guidelines are based on real-device testing conducted on the AgiLex platform. If you have code deployed on other platforms or in different data formats, we welcome pull requests!\n\nWe provide a simple [guideline](https://github.com/OpenDriveLab/UniVLA/blob/3daa7e9a8f4ca92fdee960f8d6be73508344e81d/docs/real-world-deployment.md) to deploy UniVLA on your customized setups.\n\n#### 1) LIBERO\n\u003e Please first download the [LIBERO datasets](https://huggingface.co/datasets/openvla/modified_libero_rlds/tree/main) that we used in experiments\n\nStart training with ```torchrun```:\n1) You should first set the pretrained UniVLA and latent action model path in ```vla_path``` and ```lam_path``` of the [training config](https://github.com/OpenDriveLab/UniVLA/blob/b502b3eddc05fef9984d34932a41c96e5a9f21a3/vla-scripts/finetune_libero.py#L107).\n2) Set your local LIBERO dataset path in [```data_root_dir```](https://github.com/OpenDriveLab/UniVLA/blob/b502b3eddc05fef9984d34932a41c96e5a9f21a3/vla-scripts/finetune_libero.py#L110).\n3) You can choose ```dataset_name``` from ```libero_spatial_no_noops```, ```libero_object_no_noops```, ```libero_goal_no_noops```, and ```libero_10_no_noops```\n\u003e We trained on *'Spatial'*, *'Object'* and *'Goal'* for 30k steps and *'Long'* for 40k steps. Please first modify the ```max_steps``` in training config accordingly for reproduction.\n\n```bash\n# Start training on LIBERO-10(long) with 8 GPUs\ntorchrun --standalone --nnodes 1 --nproc-per-node 8 finetune_libero.py \\\n                                 --dataset_name \"libero_10_no_noops\" \\\n                                 --run_root_dir \"libero_log\" \\\n```\n\nOnce you finished training and get the action decoder and VLA backbone, you can simply start evaluation with:\n\n\n```bash\n# Start evaluation on LIBERO-10\n# [Optional] Install LIBERO dependencies\npip install -r experiments/robot/libero/libero_requirements.txt\n\n# By default, we test for 50 rollouts every task, totalling 500 independent trials.\npython experiments/robot/libero/run_libero_eval.py \\\n    --task_suite_name libero_10 \\   # Choose from [libero_spatial, libero_object, libero_goal, libero_10] \n    --action_decoder_path /path/to/your/action_decoder_path.pt \\\n    --pretrained_checkpoint /path/to/your/libero_10_finetuned_univla \\\n    --save_video False    # Whether to save rollout videos \\\n    --num_trials_per_task 50 \\\n    --seed 7\n```\n\n\u003e To be updated.\n\n## :rocket: UniVLA's Performance\n\n\u003e [!NOTE]\n\u003e LIBERO Simulation Benchmark Results.\n\n\u003ctable style=\"width:100%; border:1px solid; border-collapse:collapse;\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: center; border:1px solid;\"\u003e\n      \u003cth rowspan=\"2\" style=\"border:1px solid;\"\u003eModel\u003c/th\u003e\n      \u003cth colspan=\"2\" style=\"border:1px solid;\"\u003eLIBERO-Spatial\u003c/th\u003e\n      \u003cth colspan=\"2\" style=\"border:1px solid;\"\u003eLIBERO-Object\u003c/th\u003e\n      \u003cth colspan=\"2\" style=\"border:1px solid;\"\u003eLIBERO-Goal\u003c/th\u003e\n      \u003cth colspan=\"2\" style=\"border:1px solid;\"\u003eLIBERO-Long\u003c/th\u003e\n      \u003cth colspan=\"2\" style=\"border:1px solid;\"\u003eAverage\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"text-align: center; border:1px solid;\"\u003e\n      \u003cth style=\"border:1px solid;\"\u003eSR (↑)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eRank (↓)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eSR (↑)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eRank (↓)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eSR (↑)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eRank (↓)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eSR (↑)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eRank (↓)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eSR (↑)\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003eRank (↓)\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eDiffusion Policy\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e 78.3 ± 1.1%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e 5\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e92.5 ± 0.7%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e2\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e68.3 ± 1.2%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e5\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e50.5 ± 1.3%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e5\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e72.4 ± 0.7%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e5\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eOcto\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e78.9 ± 1.0%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e4\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e85.7 ± 0.9%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e4\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e84.6 ± 0.9%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e2\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e51.1 ± 1.3%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e4\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e75.1 ± 0.6%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e3\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eOpenVLA\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e84.7 ± 0.9%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e2\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e88.4 ± 0.8%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e3\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e79.2 ± 1.0%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e3\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e53.7 ± 1.3%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e3\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e76.5 ± 0.6%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e2\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eTraceVLA\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e84.6 ± 0.2%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e3\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e85.2 ± 0.4%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e5\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e75.1 ± 0.3%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e4\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e54.1 ± 1.0%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e2\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e74.8 ± 0.5%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e4\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eUniVLA (Ours)\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e96.5 ± 0.5%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e1\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e96.8 ± 0.5%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e1\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e95.6 ± 0.4%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e1\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e92.0 ± 1.0%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e1\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e95.2 ± 0.3%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e1\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n\u003e [!NOTE]\n\u003e LIBERO Results with Limited Data. (Models are trained with 10%, 20%, 50%, and the full dataset)\n\n\u003ctable style=\"width:100%; border:1px solid; border-collapse:collapse;\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: center; border:1px solid;\"\u003e\n      \u003cth rowspan=\"2\" style=\"border:1px solid;\"\u003eModel\u003c/th\u003e\n      \u003cth colspan=\"4\" style=\"border:1px solid;\"\u003eLIBERO-Goal\u003c/th\u003e\n      \u003cth colspan=\"4\" style=\"border:1px solid;\"\u003eLIBERO-Long\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"text-align: center; border:1px solid;\"\u003e\n      \u003cth style=\"border:1px solid;\"\u003e10%\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003e20%\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003e50%\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003e100%\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003e10%\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003e20%\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003e50%\u003c/th\u003e\n      \u003cth style=\"border:1px solid;\"\u003e100%\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eATM\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e64.3%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e77.1%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e-\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e-\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e36.5%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e39.1%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e-\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e-\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eOpenVLA\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e61.4%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e66.0%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e77.0%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e79.2%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e11.6%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e22.4%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e36.6%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e53.7%\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eOpenVLA-OFT\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e76.8%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e88.2%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e91.1%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e96.2%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e43.0%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e62.2%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e77.8%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e90.7%\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr style=\"border:1px solid;\"\u003e\n      \u003ctd style=\"border:1px solid;\"\u003eUniVLA (Ours)\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e86.3%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e90.4%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e93.1%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e95.6%\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e62.4%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e71.4%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e87.0%\u003c/b\u003e\u003c/td\u003e\n      \u003ctd style=\"border:1px solid; text-align:center;\"\u003e\u003cb\u003e92.0%\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003e [!NOTE]\n\u003e Real-world Experiments.\n\u003cdiv id=\"top\" align=\"center\"\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/real-world-exp_1.png\" width=\"1000px\" \u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n## :pencil: Citation\nIf you find our code or models useful in your work, please cite [our paper](https://arxiv.org/pdf/2505.06111):\n\n```bibtex\n@article{bu2025univla,\n  title={UniVLA: Learning to Act Anywhere with Task-centric Latent Actions}, \n  author={Qingwen Bu and Yanting Yang and Jisong Cai and Shenyuan Gao and Guanghui Ren and Maoqing Yao and Ping Luo and Hongyang Li},\n  journal={arXiv preprint arXiv:2505.06111},\n  year={2025}\n}\n```\n\n## Acknowledgements\n\nWe thank [OpenVLA](https://github.com/openvla/openvla) for their open-sourced work!\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopendrivelab%2Funivla","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopendrivelab%2Funivla","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopendrivelab%2Funivla/lists"}