https://github.com/fuse-model/FuSe
https://github.com/fuse-model/FuSe
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/fuse-model/FuSe
- Owner: fuse-model
- License: mit
- Created: 2024-12-19T22:15:00.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-01-13T16:26:13.000Z (9 months ago)
- Last Synced: 2025-01-13T16:47:10.962Z (9 months ago)
- Language: Python
- Homepage: https://fuse-model.github.io
- Size: 19 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Video-Robotic-Papers - Code
README
# Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding
[](https://huggingface.co/oier-mees/FuSe)
[](https://huggingface.co/datasets/oier-mees/FuSe)
[](https://www.python.org)
[](https://opensource.org/licenses/MIT)
[](https://fuse-model.github.io/)[Joshua Jones](https://www.linkedin.com/in/joshua-w-jones/), [Oier Mees](https://www.oiermees.com/), [Carmelo Sferrazza](https://sferrazza.cc/), [Kyle Stachowicz](https://kylesta.ch/), [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/), [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/)
This repo contains code to **Fu**se heterogeneous **Se**nsory (FuSE) data, like touch sensing or audio, into generalist robot policies via language grounding. We release both a dataset of 26,866 robot trajectories collected heterogeneous sensory modalities and checkpoints for our two main models: Octo a large diffusion-based transformer model and a 3B VLA based on PaliGemma.
Our code is built on top of the [Octo](https://github.com/octo-models/octo) and [PaliVLA](https://github.com/kylestach/bigvision-palivla) codebases.
## Get Started
Install PaliVLA:
```
cd palivla_digit
uv venv
source .venv/bin/activate
uv sync --extra [gpu or tpu]
uv pip install -e ../octo_digit --no-deps
uv pip install -e ../bridge_with_digit/widowx_envs
uv pip install -e .
```Install Octo:
```
cd octo_digit
uv venv
source .venv/bin/activate
uv sync --extra [gpu or tpu]
uv pip install -e ../bridge_with_digit/widowx_envs
uv pip install -e .
```# Dataset Download
We provide a dataset containing 26,866 trajectories collected on a WidowX robot at the RAIL lab @ UC Berkeley, USA. It contains visual, tactile, sound and action data collected across several environments, annotated with natural language.
You can download the dataset from the following [HuggingFace dataset](https://huggingface.co/datasets/oier-mees/FuSe).# Model Training
For Octo:
```bash
python octo_digit/scripts/finetune_fuse.py --config=scripts/configs/fuse_config.py
```
For PaliVLA:
```bash
python palivla_digit/palivla/train_fuse.py --config=palivla_digit/palivla/configs/fuse_config.py
```# Inference with Pretrained Models
Install `bridge_with_digit` on the robot controller, and start the action server.Download the pretrained models from the [HuggingFace model hub](https://huggingface.co/oier-mees/FuSe).
For Octo:
```bash
python octo_digit/eval/fuse_eval.py --checkpoint_weights_path=ckpt.pth
```
For PaliVLA:
```bash
python palivla_digit/eval_palivla.py --checkpoint_dir=ckpt.pth
```# License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. PaliVLA is licensed under the Apache 2.0 License - see the [LICENSE](palivla_digit/LICENSE) file for details.## Citation
```bibtex
@article{jones2025fuse,
title={Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding},
author={Jones, Joshua and Mees, Oier and Sferrazza, Carmelo and Stachowicz, Kyle and Abbeel, Pieter and Levine, Sergey},
journal={arXiv preprint arXiv:2501.04693},
year={2025}
}
```