https://github.com/fuse-model/FuSe

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/fuse-model/FuSe
Owner: fuse-model
License: mit
Created: 2024-12-19T22:15:00.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-01-13T16:26:13.000Z (9 months ago)
Last Synced: 2025-01-13T16:47:10.962Z (9 months ago)
Language: Python
Homepage: https://fuse-model.github.io
Size: 19 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Video-Robotic-Papers - Code

README

          # Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding

[![HF Models](https://img.shields.io/badge/%F0%9F%A4%97-Models-yellow)](https://huggingface.co/oier-mees/FuSe)

[![HF Dataset](https://img.shields.io/badge/%F0%9F%A4%97-Dataset-yellow)](https://huggingface.co/datasets/oier-mees/FuSe)

[![Python](https://img.shields.io/badge/python-3.10-blue)](https://www.python.org)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![Static Badge](https://img.shields.io/badge/Project-Page-a)](https://fuse-model.github.io/)

[Joshua Jones](https://www.linkedin.com/in/joshua-w-jones/), [Oier Mees](https://www.oiermees.com/), [Carmelo Sferrazza](https://sferrazza.cc/), [Kyle Stachowicz](https://kylesta.ch/), [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/), [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/)



This repo contains code to **Fu**se heterogeneous **Se**nsory (FuSE) data, like touch sensing or audio, into generalist robot policies via language grounding. We release both a dataset of 26,866 robot trajectories collected heterogeneous sensory modalities and checkpoints for our two main models: Octo a large diffusion-based transformer model and a 3B VLA based on PaliGemma.

Our code is built on top of the [Octo](https://github.com/octo-models/octo) and [PaliVLA](https://github.com/kylestach/bigvision-palivla) codebases.

![FuSE model](media/teaser.jpg)

## Get Started

Install PaliVLA:

```

cd palivla_digit

uv venv

source .venv/bin/activate

uv sync --extra [gpu or tpu]

uv pip install -e ../octo_digit --no-deps

uv pip install -e ../bridge_with_digit/widowx_envs

uv pip install -e .

```

Install Octo:

```

cd octo_digit

uv venv

source .venv/bin/activate

uv sync --extra [gpu or tpu]

uv pip install -e ../bridge_with_digit/widowx_envs

uv pip install -e .

```

# Dataset Download

We provide a dataset containing 26,866 trajectories collected on a WidowX robot at the RAIL lab @ UC Berkeley, USA. It contains visual, tactile, sound and action data collected across several environments, annotated with natural language.

You can download the dataset from the following [HuggingFace dataset](https://huggingface.co/datasets/oier-mees/FuSe).

# Model Training

For Octo:

```bash

python octo_digit/scripts/finetune_fuse.py --config=scripts/configs/fuse_config.py

```

For PaliVLA:

```bash

python palivla_digit/palivla/train_fuse.py --config=palivla_digit/palivla/configs/fuse_config.py

```

# Inference with Pretrained Models

Install `bridge_with_digit` on the robot controller, and start the action server.

Download the pretrained models from the [HuggingFace model hub](https://huggingface.co/oier-mees/FuSe).

For Octo:

```bash

python octo_digit/eval/fuse_eval.py --checkpoint_weights_path=ckpt.pth

```

For PaliVLA:

```bash

python palivla_digit/eval_palivla.py --checkpoint_dir=ckpt.pth

```

# License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. PaliVLA is licensed under the Apache 2.0 License - see the [LICENSE](palivla_digit/LICENSE) file for details.

## Citation

```bibtex

@article{jones2025fuse,

  title={Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding},

  author={Jones, Joshua and Mees, Oier and Sferrazza, Carmelo and Stachowicz, Kyle and Abbeel, Pieter and Levine, Sergey},

  journal={arXiv preprint arXiv:2501.04693},

  year={2025}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fuse-model/FuSe

Awesome Lists containing this project

README