https://github.com/damaggu/TADP

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/damaggu/TADP
Owner: damaggu
Created: 2023-09-04T15:48:44.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-04-13T18:48:34.000Z (about 1 year ago)
Last Synced: 2024-04-14T10:01:40.407Z (about 1 year ago)
Size: 2.37 MB
Stars: 0
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

        # Text-image Alignment for Diffusion-based Perception (TADP)

[![Project Page](https://img.shields.io/badge/Project%20Page-Link)](https://www.vision.caltech.edu/tadp/)

[![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://arxiv.org/abs/2310.00031)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1uadQ8iukjmnjEScn1qKWuy9ijcUbPMuN?usp=sharing)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-image-alignment-for-diffusion-based/semantic-segmentation-on-nighttime-driving)](https://paperswithcode.com/sota/semantic-segmentation-on-nighttime-driving?p=text-image-alignment-for-diffusion-based)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-image-alignment-for-diffusion-based/weakly-supervised-object-detection-on-1)](https://paperswithcode.com/sota/weakly-supervised-object-detection-on-1?p=text-image-alignment-for-diffusion-based)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-image-alignment-for-diffusion-based/weakly-supervised-object-detection-on-comic2k)](https://paperswithcode.com/sota/weakly-supervised-object-detection-on-comic2k?p=text-image-alignment-for-diffusion-based)	

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-image-alignment-for-diffusion-based/semantic-segmentation-on-pascal-voc-2012-val)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-voc-2012-val?p=text-image-alignment-for-diffusion-based)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-image-alignment-for-diffusion-based/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=text-image-alignment-for-diffusion-based)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-image-alignment-for-diffusion-based/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=text-image-alignment-for-diffusion-based)

---

Official implementation of the paper **Text-Image Alignment for Diffusion-based Perception (CVPR 2024)**.

[Neehar Kondapaneni*](https://nkondapa.github.io/),

[Markus Marks*](https://damaggu.github.io/),

[Manuel Knott*](https://scholar.google.com/citations?user=e9xfiKEAAAAJ&hl=en),

[Rogerio Guimaraes](https://rogeriojr.com/),

[Pietro Perona](https://www.vision.caltech.edu/)

![methods](assets/methods.gif)

## Setup

We have 2 seperate shell scripts for setting up the environment. 

- `setup.sh` for setting up the environment for Pascal VOC Semantic Segmentation and Watercolor2k and Comic2k Object Detection.

- `setup_mm.sh` for setting up the environment for ADE20k Semantic Segmentation, NYUv2 Depth Estimation, Nighttime Driving, and Dark Zurich Semantic Segmentation (using MM libraries).

```bash

bash setup.sh

```

## Inference

If you want to use our models for inference, there are two options available:

### Single image inference

We provide a simple interface to load our model checkpoints and run inference with custom image and text inputs.

Please refer to the [demo/](demo/) directory for examples.

```

export PYTHONPATH=$PYTHONPATH:$(pwd)

python demo/depth_inference.py

python demo/seg_inference.py

python demo/detection_inference.py

python demo/seg_inference_driving.py

```

### Whole data set testing

If you want to generate results for a whole dataset that was used in our study (e.g., ADE20k, NYUv2) using pre-generated captions, 

please refer to the [test_tadp_mm.py](test_tadp_mm.py) and [test_tadp_depth.py](test_tadp_depth.py) scripts.

## Training

TODO

## Experiments

All results that are reported in our paper can be reproduced using the scripts in the [cvpr_experiments/](cvpr_experiments/) directory.

# Acknowledgements

This code is based on [VPD](https://github.com/wl-zhao/VPD), [diffusers](https://github.com/wl-zhao/VPD), [stable-diffusion](https://github.com/CompVis/stable-diffusion), [mmsegmentation](https://github.com/open-mmlab/mmsegmentation), [LAVT](https://github.com/yz93/LAVT-RIS), and [MIM-Depth-Estimation](https://github.com/SwinTransformer/MIM-Depth-Estimation).

# Citation

```

@article{kondapaneni2024tadp,

  title={Text-Image Alignment for Diffusion-Based Perception},

  author={Kondapaneni, Neehar and Marks, Markus and Knott, Manuel and Guimaraes, Rogerio and Perona, Pietro},

  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},

  year={2024},

  month={June},

  pages={13883-13893}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/damaggu/TADP

Awesome Lists containing this project

README