Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/EnVision-Research/Lotus
Official Implementation of LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
https://github.com/EnVision-Research/Lotus
Last synced: 8 days ago
JSON representation
Official Implementation of LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
- Host: GitHub
- URL: https://github.com/EnVision-Research/Lotus
- Owner: EnVision-Research
- License: apache-2.0
- Created: 2024-09-22T20:40:02.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-18T19:31:47.000Z (8 days ago)
- Last Synced: 2025-01-18T20:28:33.376Z (8 days ago)
- Language: Python
- Homepage: https://lotus3d.github.io
- Size: 17 MB
- Stars: 568
- Watchers: 17
- Forks: 29
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-and-novel-works-in-slam - [Code
README
# Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
[![Page](https://img.shields.io/badge/Project-Website-pink?logo=googlechrome&logoColor=white)](https://lotus3d.github.io/)
[![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2409.18124)
[![HuggingFace Demo](https://img.shields.io/badge/đ¤%20HuggingFace-Demo%20(Depth)-yellow)](https://huggingface.co/spaces/haodongli/Lotus_Depth)
[![HuggingFace Demo](https://img.shields.io/badge/đ¤%20HuggingFace-Demo%20(Normal)-yellow)](https://huggingface.co/spaces/haodongli/Lotus_Normal)
[![ComfyUI](https://img.shields.io/badge/%20ComfyUI-Demo%20&%20Cloud%20API-%23C18CC1?logo=)](https://github.com/kijai/ComfyUI-Lotus)
[![Replicate](https://replicate.com/chenxwh/lotus/badge?logo=)](https://replicate.com/chenxwh/lotus)[Jing He](https://scholar.google.com/citations?hl=en&user=RsLS11MAAAAJ)1âą,
[Haodong Li](https://haodong-li.com/)1âą,
[Wei Yin](https://yvanyin.net/)2,
[Yixun Liang](https://yixunliang.github.io/)1,
[Leheng Li](https://len-li.github.io/)1,
[Kaiqiang Zhou]()3,
[Hongbo Zhang]()3,
[Bingbing Liu](https://scholar.google.com/citations?user=-rCulKwAAAAJ&hl=en)3,
[Ying-Cong Chen](https://www.yingcong.me/)1,4â
![teaser](assets/badges/teaser_1.jpg)
![teaser](assets/badges/teaser_2.jpg)We present **Lotus**, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.
## đĸ News
- 2025-01-17: Please check out our latest models ([lotus-normal-g-v1-1](https://huggingface.co/jingheya/lotus-normal-g-v1-1), [lotus-normal-d-v1-1](https://huggingface.co/jingheya/lotus-normal-d-v1-1)), which were trained with aligned surface normals, leading to improved performance!
- 2024-11-13: The demo now supports video depth estimation!
- 2024-11-13: The Lotus disparity models ([Generative](https://huggingface.co/jingheya/lotus-depth-g-v2-0-disparity) & [Discriminative](https://huggingface.co/jingheya/lotus-depth-d-v2-0-disparity)) are now available, which achieve better performance!
- 2024-10-06: The demos are now available ([Depth](https://huggingface.co/spaces/haodongli/Lotus_Depth) & [Normal](https://huggingface.co/spaces/haodongli/Lotus_Normal)). Please have a try!
- 2024-10-05: The inference code is now available!
- 2024-09-26: [Paper](https://arxiv.org/abs/2409.18124) released. Click [here](https://github.com/EnVision-Research/Lotus/issues/14#issuecomment-2409094495) if you are curious about the 3D point clouds of the teaser's depth maps!## đ ī¸ Setup
This installation was tested on: Ubuntu 20.04 LTS, Python 3.10, CUDA 12.3, NVIDIA A800-SXM4-80GB.1. Clone the repository (requires git):
```
git clone https://github.com/EnVision-Research/Lotus.git
cd Lotus
```2. Install dependencies (requires conda):
```
conda create -n lotus python=3.10 -y
conda activate lotus
pip install -r requirements.txt
```## đ¤ Gradio Demo
1. Online demo: [Depth](https://huggingface.co/spaces/haodongli/Lotus_Depth) & [Normal](https://huggingface.co/spaces/haodongli/Lotus_Normal)
2. Local demo
- For **depth** estimation, run:
```
python app.py depth
```
- For **normal** estimation, run:
```
python app.py normal
```## đšī¸ Usage
### Testing on your images
1. Place your images in a directory, for example, under `assets/in-the-wild_example` (where we have prepared several examples).
2. Run the inference command: `bash infer.sh`.### Evaluation on benchmark datasets
1. Prepare benchmark datasets:
- For **depth** estimation, you can download the [evaluation datasets (depth)](https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/) by the following commands (referred to [Marigold](https://github.com/prs-eth/Marigold?tab=readme-ov-file#-evaluation-on-test-datasets-))īŧ
```
cd datasets/eval/depth/
wget -r -np -nH --cut-dirs=4 -R "index.html*" -P . https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/
```
- For **normal** estimation, you can download the [evaluation datasets (normal)](https://drive.google.com/drive/folders/1t3LMJIIrSnCGwOEf53Cyg0lkSXd3M4Hm?usp=drive_link) (`dsine_eval.zip`) into the path `datasets/eval/normal/` and unzip it (referred to [DSINE](https://github.com/baegwangbin/DSINE?tab=readme-ov-file#getting-started)).2. Run the evaluation command: `bash eval_scripts/eval-[task]-[mode].sh`, where `[task]` represents the task name (depth or normal) and `[mode]` refers to the mode name (d or g).
(Optional) To reproduce the results presented in our paper, you can set the `--rng_state_path` option in the evaluation command. The RNG state files are available at `./rng_states/`.### Choose your model
Below are the released models and their corresponding configurations:
|CHECKPOINT_DIR |TASK_NAME |MODE |
|:--:|:--:|:--:|
| [`jingheya/lotus-depth-g-v1-0`](https://huggingface.co/jingheya/lotus-depth-g-v1-0) | depth| `generation`|
| [`jingheya/lotus-depth-d-v1-0`](https://huggingface.co/jingheya/lotus-depth-d-v1-0) | depth|`regression` |
| [`jingheya/lotus-depth-g-v2-1-disparity`](https://huggingface.co/jingheya/lotus-depth-g-v2-1-disparity) | depth (disparity)| `generation`|
| [`jingheya/lotus-depth-d-v2-0-disparity`](https://huggingface.co/jingheya/lotus-depth-d-v2-0-disparity) | depth (disparity)|`regression` |
| [`jingheya/lotus-normal-g-v1-1`](https://huggingface.co/jingheya/lotus-normal-g-v1-1) |normal | `generation` |
| [`jingheya/lotus-normal-d-v1-1`](https://huggingface.co/jingheya/lotus-normal-d-v1-1) |normal |`regression` |## đ Citation
If you find our work useful in your research, please consider citing our paper:
```bibtex
@article{he2024lotus,
title={Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction},
author={He, Jing and Li, Haodong and Yin, Wei and Liang, Yixun and Li, Leheng and Zhou, Kaiqiang and Liu, Hongbo and Liu, Bingbing and Chen, Ying-Cong},
journal={arXiv preprint arXiv:2409.18124},
year={2024}
}
```