Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/RuipingL/OpenSU
https://github.com/RuipingL/OpenSU
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/RuipingL/OpenSU
- Owner: RuipingL
- License: apache-2.0
- Created: 2023-07-17T20:31:29.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-07T14:44:20.000Z (12 months ago)
- Last Synced: 2024-04-23T00:21:25.414Z (9 months ago)
- Language: Python
- Size: 10 MB
- Stars: 15
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# Open Scene Understanding (ICCV ACVR 2023)
### Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments [[PDF](https://arxiv.org/pdf/2307.07757)]
![My Image](img/Flowchart.png)
## Environment
```
# Clone this repository and navigate into the repository
git clone https://github.com/RuipingL/OpenSU.git
cd OpenSU# Create a conda environment, activate the environment and install PyTorch via conda
conda create --name OpenSU python=3.9
conda activate OpenSU
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge# Install requirements via pip
pip install -r requirements.txt# Install Segment Anything
pip install git+https://github.com/facebookresearch/segment-anything.git
```
## Dataset Preparation
Download [images](https://swig-data-weights.s3.us-east-2.amazonaws.com/images_512.zip) to the folder `SWiG`, and [json files](https://github.com/jhcho99/CoFormer/tree/master/SWiG/SWiG_jsons) to the folder `SWiG/SWiG_jsons`.
## Model Checkpoints
Download
[Swin-T](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth) to the folder `ckpt/Swin`,
[Segment Anything](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt) to the folder `ckpt/sam`, and [GSR model](https://drive.google.com/file/d/1i44Y5YIJ7ECNq9lYBOd4Qlcp7TVL79zz/view?usp=drive_link) to the folder `ckpt`.
## Training
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch_size 4 --dataset_file swig --epochs 40 --num_workers 4 --num_glance_enc_layers 3 --num_gaze_s1_dec_layers 3 --num_gaze_s1_enc_layers 3 --num_gaze_s2_dec_layers 3 --dropout 0.15 --hidden_dim 512 --output_dir OpenSU
```
## Evaluation
```
python main.py --saved_model ckpt/OpenSU_Swin.pth --output_dir OpenSU_eva --dev # Evaluation on develpment set
python main.py --saved_model ckpt/OpenSU_Swin.pth --output_dir OpenSU_eva --test # Evaluation on test set
```
## Demo
```
python demo.py --image_path img/carting_214.jpg --sam sam # Using vanilla Segment Anything as segmentation map generator
python demo.py --image_path img/carting_214.jpg --sam mobilesam # Using MobileSAM as segmentation map generator
```
Output:
```
# Text information
verb: carting
role: agent, noun: dog.n.01
role: item, noun: man.n.01
role: tool, noun: cart.n.01
role: place, noun: outdoors.n.01
the dog cartes the man in a cart at a outdoors.
```## Citation
Our system is built upon the framework of [CoFormer](https://arxiv.org/abs/2203.16518).
If you use our OpenSU, please cite
```
@inproceedings{liu2023opensu,
title={Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments},
author={Liu, Ruiping and Zhang, Jiaming and Peng, Kunyu and Zheng, Junwei and Cao, Ke and Chen, Yufan and Yang, Kailun and Stiefelhagen, Rainer},
booktitle={2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
year={2023}
}
```