Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/showlab/afformer
Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)
https://github.com/showlab/afformer
deep-learning pytorch
Last synced: about 2 months ago
JSON representation
Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)
- Host: GitHub
- URL: https://github.com/showlab/afformer
- Owner: showlab
- Created: 2023-03-26T19:05:59.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-26T16:23:25.000Z (5 months ago)
- Last Synced: 2024-07-26T18:15:29.281Z (5 months ago)
- Topics: deep-learning, pytorch
- Language: Python
- Homepage:
- Size: 22.5 MB
- Stars: 36
- Watchers: 3
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/affordance-grounding-from-demonstration-video-1/video-to-image-affordance-grounding-on-opra)](https://paperswithcode.com/sota/video-to-image-affordance-grounding-on-opra?p=affordance-grounding-from-demonstration-video-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/affordance-grounding-from-demonstration-video-1/video-to-image-affordance-grounding-on-opra-1)](https://paperswithcode.com/sota/video-to-image-affordance-grounding-on-opra-1?p=affordance-grounding-from-demonstration-video-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/affordance-grounding-from-demonstration-video-1/video-to-image-affordance-grounding-on-epic)](https://paperswithcode.com/sota/video-to-image-affordance-grounding-on-epic?p=affordance-grounding-from-demonstration-video-1)# Affordance Grounding from Demonstration Video to Target Image
This repository is the official implementation of [Affordance Grounding from Demonstration Video to Target Image](https://arxiv.org/abs/2303.14644):
![](readme.png)
![](demo/output1.gif)
```
@inproceedings{afformer,
author = {Joya Chen and Difei Gao and Kevin Qinghong Lin and Mike Zheng Shou},
title = {Affordance Grounding from Demonstration Video to Target Image},
booktitle = {CVPR},
year = {2023},
}
```## Install
### 1. PyTorch
We now support PyTorch 2.0. Other version should be okay.
```
conda install -y pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
```*NOTE: If you want to use PyTorch 2.0, you should install CUDA >= 11.7. See https://pytorch.org/.*
### 2. PyTorch Lightning
We use PyTorch Lightning 2.0 as the training and inference engines.
```
pip install lightning jsonargparse[signatures] --upgrade
```### 3. xFormers
We use memory-efficient attention in [xformers](https://github.com/facebookresearch/xformers). Currently PyTorch 2.0 does not support memory-efficient attention relative positional encoding (see [pytorch/issues/96099](https://github.com/pytorch/pytorch/issues/96099)). We will update this repo when PyTorch supports this.
```
pip install triton --upgrade
pip install --pre xformers
```### 4. Timm, Detectron2, Others
We borrow some implementations from [timm](https://github.com/huggingface/pytorch-image-models) and [detectron2](https://github.com/facebookresearch/detectron2).
```
pip install timm opencv-python av imageio --upgrade
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
```## Dataset
* Downloading OPRA dataset from https://github.com/kuanfang/opra. Due to the copyright issue, you may need to download the original video from YouTube.
* We have uploaded our organized annotation json files to [datasets/opra/annotations](datasets/opra/annotations). Now your datasets file tree should be:
```
datasets
└── opra
├── annotations
│ ├── test.json
│ ├── train.json
├── clips
│ ├── aocom
│ ├── appliances
│ ├── bestkitchenreview
│ ├── cooking
│ ├── eguru
│ └── seattle
└── images
├── aocom
├── appliances
├── bestkitchenreview
├── cooking
├── eguru
└── seattle
```* *We are working on organizing EPIC-Hotspot and AssistQ Buttons. They will be released as soon as possible.*
## Afformer Model
*Hint: We recommend you to read [LightningCLI](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli.html) if you firstly use it. That helps you better use these commands.*
### 1. ResNet-50-FPN encoder
* You dont need to manually download pre-trained encoder weight. ```torchvision``` will automatically download it. See [torchvision.models.detection.fasterrcnn_resnet50_fpn_v2](https://pytorch.org/vision/main/models/generated/torchvision.models.detection.fasterrcnn_resnet50_fpn_v2.html) for details.
* Training Afformer with ResNet-50-FPN encoder with
```shell
python main.py fit --config configs/opra/r50fpn.yaml --trainer.devices 8 --data.batch_size_per_gpu 2
```* The training log is saved in `outputs/` by default. You can launch a tensorboard to monitor this folder:
```shell
tensorboard --logdir outputs/ --port 2333
# Then you can see real-time losses, metrics at http://localhost:2333/
```* The evaluation would be done each 1k iterations during training. You can also evaluate with the `validate` command. For example,
```shell
python main.py validate --config configs/opra/r50fpn.yaml --trainer.devices 8 --data.batch_size_per_gpu 2 --ckpt outputs/opra/r50fpn/lightning_logs/version_0/checkpoints/xxxx.ckpt
```### 2. ViTDet-B encoder
* Downloading [ViTDet-B-COCO weights](https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/mask_rcnn_vitdet_b/f325346929/model_final_61ccd1.pkl) and then put it to [weights/](weights/) folder: ```weights/mask_rcnn_vitdet_b_coco.pkl```.
* Training Afformer with ViTDet-B encoder with
```shell
python main.py fit --config configs/opra/vitdet.yaml --trainer.devices 8 --data.batch_size_per_gpu 2
```* The training log is saved in `outputs/` by default. You can launch a tensorboard to monitor this folder:
```shell
tensorboard --logdir outputs/ --port 2333
# Then you can see real-time losses, metrics at http://localhost:2333/
```* The evaluation would be done each 1k iterations during training. You can also evaluate with the `validate` command. For example,
```shell
python main.py validate --config configs/opra/vitdet_b.yaml --trainer.devices 8 --data.batch_size_per_gpu 2 --ckpt outputs/opra/vitdet_b/lightning_logs/version_0/checkpoints/xxxx.ckpt
```### 3. Visualization
```shell
python demo.py --config configs/opra/vitdet_b.yaml --weight weights/afformer_vitdet_b_v1.ckpt --video demo/video.mp4 --image demo/image.jpg --output demo/output.gif
```![](demo/output2.gif)
* Hint: we carefully fine-tuned a very strong ViTDet model, which is better than paper reported. [Download it](https://drive.google.com/uc?export=download&id=1bvgBwkRJhA2hxInrPPHlwlGH3tNfDciX).
## MaskAHand Pre-training
*NOTE: A detailed tutorial will be done as soon as possible.*
### 1. Hand Interaction Detection
* Downloading our trained hand interaction detector weights in this [url](https://drive.google.com/file/d/1Q6SLFnn-ztZDjDoFalOhVLsL0nEZWc-w/view?usp=sharing). Then put it to [weights/](weights/) folder: ```weights/hircnn_r50fpnv2_849.pth```.
* The video demo by this hand interaction detector:
![](demo/hircnn.gif)
* *Hint: we trained this simple and accurate hand interaction detector using 100DOH + some Ego datasets. It achieves 84.9 hand+interaction detection AP on 100DOH test set. For MaskAHand pre-training, this weight is enough. We will release its full source code at [chenjoya/hircnn](https://github.com/chenjoya/hircnn) as soon as possible.*
### 2. Hand Interaction Clip Mining
* Make sure your data preparation follows [Dataset](#dataset) part.
* Running [affominer/miner.py](afformer/miner.py). The generated data will be saved at `affominer/outputs`.
### 3. Target Image Synthesis and Transformation
This would be done during training. You can set the hyper-parameters in [configs/opra/maskahand/pretrain.yaml](configs/opra/maskahand/pretrain.yaml):
```yaml
mask_ratio: 1.0
num_masks: 2
distortion_scale: 0.5
num_frames: 32
clip_interval: 16
contact_threshold: 0.99
```### 4. MaskAHand Pre-training
```shell
python main.py fit --config configs/opra/maskahand/pretrain.yaml
```### 5. Fine-tuning or Zero-shot Evaluation
* Fine-tuning the MaskAHand pre-trained weight by
```shell
python main.py fit --config configs/opra/maskahand/finetune.yaml
```* Zero-shot evaluate the MaskAHand pre-trained weight by
```shell
python main.py validate --config configs/opra/maskahand/pretrain.yaml
```### 6. Visualization
You can refer to [demo.py](demo.py) to visualize your model results.
## Contact
This repository is developed by [Joya Chen](https://chenjoya.github.io/). Questions and discussions are welcome via [email protected].
## Acknowledgement
Thanks to all co-authors of the paper, [Difei Gao](https://scholar.google.com/citations?user=No9OsocAAAAJ&hl=en), [Kevin Qinghong Lin](https://qinghonglin.github.io/), and [Mike Shou](https://sites.google.com/view/showlab) (my supervisor). Also appreciate the assistance from Dongxing Mao and [Jiawei Liu](https://jia-wei-liu.github.io/).