Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fudan-zvg/meta-prompts?tab=readme-ov-file
https://github.com/fudan-zvg/meta-prompts?tab=readme-ov-file
Last synced: 29 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/fudan-zvg/meta-prompts?tab=readme-ov-file
- Owner: fudan-zvg
- License: mit
- Created: 2023-12-01T03:46:54.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-31T17:31:35.000Z (11 months ago)
- Last Synced: 2024-11-09T16:25:06.583Z (about 1 month ago)
- Language: Python
- Size: 5.65 MB
- Stars: 67
- Watchers: 4
- Forks: 2
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Monocular-Depth - Harnessing Diffusion Models for Visual Perception with Meta Prompts - zvg/meta-prompts?tab=readme-ov-file) (Papers)
README
# Harnessing Diffusion Models for Visual Perception with Meta Prompts
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/harnessing-diffusion-models-for-visual/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=harnessing-diffusion-models-for-visual)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/harnessing-diffusion-models-for-visual/monocular-depth-estimation-on-kitti-eigen)](https://paperswithcode.com/sota/monocular-depth-estimation-on-kitti-eigen?p=harnessing-diffusion-models-for-visual)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/harnessing-diffusion-models-for-visual/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=harnessing-diffusion-models-for-visual)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/harnessing-diffusion-models-for-visual/semantic-segmentation-on-cityscapes)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes?p=harnessing-diffusion-models-for-visual)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/harnessing-diffusion-models-for-visual/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=harnessing-diffusion-models-for-visual)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/harnessing-diffusion-models-for-visual/pose-estimation-on-coco)](https://paperswithcode.com/sota/pose-estimation-on-coco?p=harnessing-diffusion-models-for-visual)### [Paper](https://arxiv.org/abs/2312.14733)
> [**Harnessing Diffusion Models for Visual Perception with Meta Prompts**](https://arxiv.org/abs/2312.14733),
> Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang## 📸 Release
* ⏳ Pose estimation training code and model.
* **`Jan. 31th, 2024`**: Release semantic segmentation training code and model.
* **`Jan. 6th, 2024`**: Release depth estimation training code and model.## Installation
Clone this repo, and run
```
sh install.sh
```
Download the checkpoint of [stable-diffusion](https://github.com/runwayml/stable-diffusion) (we use `v1-5` by default) and put it in the `checkpoints` folder.## Depth Estimation with meta prompts
MetaPrompts obtains 0.223 RMSE on NYUv2 depth estimation benchmark and 1.929 RMSE on KITTI Eigen split, establishing the new state-of-the-art.| NYUv2 | RMSE | d1 | d2 | d3 | REL |
|-------------------|-------|-------|--------|--------|--------|
| **MetaPrompts** | 0.223 | 0.976 | 0.997 | 0.999 | 0.061 || KITTI | RMSE | d1 | d2 | d3 | REL |
|-------------------|-------|-------|--------|--------|--------|
| **MetaPrompts** | 1.928 | 0.981 | 0.998 | 1.000 | 0.047 |Please check [depth.md](./depth/README.md) for detailed instructions on training and inference.
## Semantic segmentation with meta prompts
MetaPrompts obtains 56.8 mIoU on ADE20K semantic segmentation benchmark and 87.3 mIoU on CityScapes, establishing the new state-of-the-art.| ADE20K | Head | Crop Size | mIoU | mIoU (ms+flip) |
|-------------------|-------|-------|--------|--------|
| **MetaPrompts** | Upernet | 512x512 | 55.83 | 56.81 || CityScapes | Head | Crop Size | mIoU | mIoU (ms+flip) |
|-------------------|-------|-------|--------|--------|
| **MetaPrompts** | Upernet | 1024x1024 | 85.98 | 87.26 |Please check [segmentation.md](./segmentation/README.md) for detailed instructions on training and inference.
## License
MIT License## Acknowledgements
This code is based on [stable-diffusion](https://github.com/CompVis/stable-diffusion), [mmsegmentation](https://github.com/open-mmlab/mmsegmentation), [LAVT](https://github.com/yz93/LAVT-RIS), [VPD](https://github.com/wl-zhao/VPD), [ViTPose](https://github.com/ViTAE-Transformer/ViTPose), [mmpose](https://github.com/open-mmlab/mmpose), and [MIM-Depth-Estimation](https://github.com/SwinTransformer/MIM-Depth-Estimation).## BibTeX
If you find our work useful in your research, please consider citing:
```
@article{wan2023harnessing,
title={Harnessing Diffusion Models for Visual Perception with Meta Prompts},
author={Wan, Qiang and Huang, Zilong and Kang, Bingyi and Feng, Jiashi and Zhang, Li},
journal={arXiv preprint arXiv:2312.14733},
year={2023}
}
```