Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bdaiinstitute/theia
https://github.com/bdaiinstitute/theia
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/bdaiinstitute/theia
- Owner: bdaiinstitute
- License: other
- Created: 2024-06-07T14:40:48.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-08-05T13:04:46.000Z (5 months ago)
- Last Synced: 2024-08-05T14:58:52.332Z (5 months ago)
- Language: Python
- Size: 135 MB
- Stars: 70
- Watchers: 4
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
Jinghuan Shang1,2, Karl Schmeckpeper1, Brandon B. May1, Maria Vittoria Minniti1, Tarik Kelestemur1, David Watkins1, Laura Herlant1
1The AI Institute
2Stony Brook University
CoRL 2024
Project Page, Paper, Models, Demo
## Quick Start: Use Pre-trained Theia Models
Through huggingface:
```
import transformers
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained("theaiinstitute/theia-base-patch16-224-cdiv", trust_remote_code=True)
fake_input = torch.zeros((1, 224 ,224, 3), dtype=torch.uint8)theia_feature = model.forward_feature(fake_input)
# Theia / intermediate feature, mainly used for robot learning.
# To change different feature reduction methods, pass `feature_reduction_method` argument in AutoModel.from_pretrained() methodpredicted_features = model(fake_input)
# predicted_features is dict[str, torch.Tensor] where each kv pair is target model name and predicted feature
# they are predicted features that tries to match teacher model features.
````theia--patch16-224-cdiv` are used for main evaluations in the paper.
## Installation
Make sure you have Python >= 3.10. Create any virtual Python environment you like or use the [Dockerfile](./Dockerfile). Then
```
pip install -e .
```## Data Preparation
### Datasets
The datasets should be organized in webdataset format.1. Prepare images from ImageNet
First download and [prepare](https://gist.github.com/antoinebrl/7d00d5cb6c95ef194c737392ef7e476a) ImageNet.
```
cd src/theia/scripts/preprocessing/image_datasets
python organize_imagenet_webdataset.py --dataset --imagenet-raw-path --output-path
```
For any other image dataset you want to use, you can simply dump all of them in a folder (any subfolder also works), and modify how you can get their paths in `organize_imagenet_webdataset.py` (variable `image_paths`).2. (Optional) Prepare frames from video datasets
```
cd src/theia/scripts/preprocessing/video_datasets
python subsampling_videos.py --dataset --dataset-path --output-path [--subsampling-rate] [--samples-per-shard]
```### Feature Extraction
```
cd src/theia/scripts/preprocessing
python feature_extraction.py --dataset --output-path --model --split [--num-gpus]
```You can also refer to the integrated script `src/theia/scripts/preprocessing/iv_feature_extraction.py` that launches feature extraction for multiple models at the same time.
During training we will need mean and variance for each teacher model to normalize teacher features. You can extract them using `src/theia/scripts/preprocessing/calc_feature_mean.py` or use the stats we provide in `feature_stats`.
### Expected Dataset Format
More details about dataset format are available at [dataset_format](doc/dataset_format.md). Please use this to verify or troubleshoot your data.## Training
```
cd src/theia/scripts# train theia tiny using training configuration trian_rvfm_imagenet
# with teacher models CLIP, DINOv2, and ViT
torchrun --nproc_per_node=8 --nnodes 1 --rdzv_backend c10d --rdzv_endpoint localhost:11111 train_rvfm.py --config-name=train_rvfm_imagenet logging.notes=imagenet_cdiv training/target_models=cdiv dataset.dataset_ratio=1.0 model.backbone.backbone=facebook/deit-tiny-patch16-224 logging.save_ckpt_interval=50000 dataset.dataset_root=
```To change output paths and wandb logging configs, override or modify `src/theia/configs/logging/default.yaml`.
To use different teacher models, override `training/target_models=`. Available configs are under `src/theia/configs/training/target_models`
To change different datasets, override `dataset=`. Available configs are under `src/theia/configs/dataset`.
## Decode Theia-representation to VFM outputs
You can decode Theia-predicted VFM representations to their outputs. For DINOv2 we apply the PCA vsiualization, for SAM we use decoder to generate segmentation masks (but with SAM's pipeline of prompting), and for Depth-Anything we use the deocder head to do depth prediction. Below are example outputs. Theia model should be trained on those teachers during distillation. To use any models available online, you can find models with `cddsv` in its name, indicating that it is trained on all teachers.
![](doc/more_decoding_visualization.png)
Try out our [online demo](https://huggingface.co/spaces/theaiinstitute/theia) or [notebook example](src/theia/example/decode_to_vfms.ipynb), or you can get outputs from local checkpoints by
```
cd src/theia/scripts/decoding
python decoding_example.py --backbone --checkpoint-path --feature-stat-dir --media-to-vis-path
```## References
[Webdataset](https://github.com/webdataset/webdataset), [transformers](https://github.com/huggingface/transformers), [safetensors](https://huggingface.co/docs/safetensors/en/index), [DINOv2](https://github.com/facebookresearch/dinov2), [CLIP](https://github.com/openai/CLIP), [ViT](https://github.com/google-research/vision_transformer), [SAM](https://github.com/facebookresearch/segment-anything), [RADIO](https://github.com/NVlabs/RADIO), [DepthAnything](https://github.com/LiheYoung/Depth-Anything)## Citation
If you use Theia in your research, please use the following BibTeX entry:
```bibtex
@inproceedings{
shang2024theia,
title={Theia: Distilling Diverse Vision Foundation Models for Robot Learning},
author={Jinghuan Shang and Karl Schmeckpeper and Brandon B. May and Maria Vittoria Minniti and Tarik Kelestemur and David Watkins and Laura Herlant},
booktitle={8th Annual Conference on Robot Learning},
year={2024},
url={https://openreview.net/forum?id=ylZHvlwUcI}
}
```