https://github.com/bdaiinstitute/theia

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/bdaiinstitute/theia
Owner: bdaiinstitute
License: other
Created: 2024-06-07T14:40:48.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-05T13:04:46.000Z (11 months ago)
Last Synced: 2024-08-05T14:58:52.332Z (11 months ago)
Language: Python
Size: 135 MB
Stars: 70
Watchers: 4
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Segment-Anything - [code

README

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Jinghuan Shang^1,2, Karl Schmeckpeper¹, Brandon B. May¹, Maria Vittoria Minniti¹, Tarik Kelestemur¹, David Watkins¹, Laura Herlant¹

¹The AI Institute
²Stony Brook University

CoRL 2024

Project Page, Paper, Models, Demo

## Quick Start: Use Pre-trained Theia Models
Through huggingface:
```
import transformers
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained("theaiinstitute/theia-base-patch16-224-cdiv", trust_remote_code=True)
fake_input = torch.zeros((1, 224 ,224, 3), dtype=torch.uint8)

theia_feature = model.forward_feature(fake_input)
# Theia / intermediate feature, mainly used for robot learning.
# To change different feature reduction methods, pass `feature_reduction_method` argument in AutoModel.from_pretrained() method

predicted_features = model(fake_input)
# predicted_features is dict[str, torch.Tensor] where each kv pair is target model name and predicted feature
# they are predicted features that tries to match teacher model features.
```

`theia--patch16-224-cdiv` are used for main evaluations in the paper.

## Installation
Make sure you have Python >= 3.10. Create any virtual Python environment you like or use the [Dockerfile](./Dockerfile). Then
```
pip install -e .
```

## Data Preparation

### Datasets
The datasets should be organized in webdataset format.

1. Prepare images from ImageNet

First download and [prepare](https://gist.github.com/antoinebrl/7d00d5cb6c95ef194c737392ef7e476a) ImageNet.
```
cd src/theia/scripts/preprocessing/image_datasets
python organize_imagenet_webdataset.py --dataset --imagenet-raw-path --output-path
```
For any other image dataset you want to use, you can simply dump all of them in a folder (any subfolder also works), and modify how you can get their paths in `organize_imagenet_webdataset.py` (variable `image_paths`).

2. (Optional) Prepare frames from video datasets

```
cd src/theia/scripts/preprocessing/video_datasets
python subsampling_videos.py --dataset --dataset-path --output-path [--subsampling-rate] [--samples-per-shard]
```

### Feature Extraction
```
cd src/theia/scripts/preprocessing
python feature_extraction.py --dataset --output-path --model --split [--num-gpus]
```

You can also refer to the integrated script `src/theia/scripts/preprocessing/iv_feature_extraction.py` that launches feature extraction for multiple models at the same time.

During training we will need mean and variance for each teacher model to normalize teacher features. You can extract them using `src/theia/scripts/preprocessing/calc_feature_mean.py` or use the stats we provide in `feature_stats`.

### Expected Dataset Format
More details about dataset format are available at [dataset_format](doc/dataset_format.md). Please use this to verify or troubleshoot your data.

## Training
```
cd src/theia/scripts

# train theia tiny using training configuration trian_rvfm_imagenet
# with teacher models CLIP, DINOv2, and ViT
torchrun --nproc_per_node=8 --nnodes 1 --rdzv_backend c10d --rdzv_endpoint localhost:11111 train_rvfm.py --config-name=train_rvfm_imagenet logging.notes=imagenet_cdiv training/target_models=cdiv dataset.dataset_ratio=1.0 model.backbone.backbone=facebook/deit-tiny-patch16-224 logging.save_ckpt_interval=50000 dataset.dataset_root=
```

To change output paths and wandb logging configs, override or modify `src/theia/configs/logging/default.yaml`.

To use different teacher models, override `training/target_models=`. Available configs are under `src/theia/configs/training/target_models`

To change different datasets, override `dataset=`. Available configs are under `src/theia/configs/dataset`.

## Decode Theia-representation to VFM outputs

You can decode Theia-predicted VFM representations to their outputs. For DINOv2 we apply the PCA vsiualization, for SAM we use decoder to generate segmentation masks (but with SAM's pipeline of prompting), and for Depth-Anything we use the deocder head to do depth prediction. Below are example outputs. Theia model should be trained on those teachers during distillation. To use any models available online, you can find models with `cddsv` in its name, indicating that it is trained on all teachers.

![](doc/more_decoding_visualization.png)

Try out our [online demo](https://huggingface.co/spaces/theaiinstitute/theia) or [notebook example](src/theia/example/decode_to_vfms.ipynb), or you can get outputs from local checkpoints by
```
cd src/theia/scripts/decoding
python decoding_example.py --backbone --checkpoint-path --feature-stat-dir --media-to-vis-path
```

## References
[Webdataset](https://github.com/webdataset/webdataset), [transformers](https://github.com/huggingface/transformers), [safetensors](https://huggingface.co/docs/safetensors/en/index), [DINOv2](https://github.com/facebookresearch/dinov2), [CLIP](https://github.com/openai/CLIP), [ViT](https://github.com/google-research/vision_transformer), [SAM](https://github.com/facebookresearch/segment-anything), [RADIO](https://github.com/NVlabs/RADIO), [DepthAnything](https://github.com/LiheYoung/Depth-Anything)

## Citation
If you use Theia in your research, please use the following BibTeX entry:
```bibtex
@inproceedings{
shang2024theia,
title={Theia: Distilling Diverse Vision Foundation Models for Robot Learning},
author={Jinghuan Shang and Karl Schmeckpeper and Brandon B. May and Maria Vittoria Minniti and Tarik Kelestemur and David Watkins and Laura Herlant},
booktitle={8th Annual Conference on Robot Learning},
year={2024},
url={https://openreview.net/forum?id=ylZHvlwUcI}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bdaiinstitute/theia

Awesome Lists containing this project

README

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Jinghuan Shang^1,2, Karl Schmeckpeper¹, Brandon B. May¹, Maria Vittoria Minniti¹, Tarik Kelestemur¹, David Watkins¹, Laura Herlant¹

https://github.com/bdaiinstitute/theia

Awesome Lists containing this project

README

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Jinghuan Shang1,2, Karl Schmeckpeper1, Brandon B. May1, Maria Vittoria Minniti1, Tarik Kelestemur1, David Watkins1, Laura Herlant1

Jinghuan Shang^1,2, Karl Schmeckpeper¹, Brandon B. May¹, Maria Vittoria Minniti¹, Tarik Kelestemur¹, David Watkins¹, Laura Herlant¹