{"id":13706246,"url":"https://github.com/aradhye2002/ecodepth","last_synced_at":"2025-05-05T20:30:43.225Z","repository":{"id":228199090,"uuid":"773384846","full_name":"Aradhye2002/EcoDepth","owner":"Aradhye2002","description":"[CVPR'2024] Official implementation of the paper \"ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation\"","archived":false,"fork":false,"pushed_at":"2024-06-15T16:42:45.000Z","size":16137,"stargazers_count":139,"open_issues_count":7,"forks_count":17,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-08-03T22:16:57.792Z","etag":null,"topics":["cvpr2024","deep-learning","depth-estimation","metric-depth-estimation","monocular-depth-estimation","stable-diffusion","zero-shot-transfer"],"latest_commit_sha":null,"homepage":"https://ecodepth-iitd.github.io/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Aradhye2002.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-17T14:21:35.000Z","updated_at":"2024-08-03T11:58:34.000Z","dependencies_parsed_at":"2024-06-15T17:58:02.053Z","dependency_job_id":null,"html_url":"https://github.com/Aradhye2002/EcoDepth","commit_stats":null,"previous_names":["aradhye2002/ecodepth"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aradhye2002%2FEcoDepth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aradhye2002%2FEcoDepth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aradhye2002%2FEcoDepth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aradhye2002%2FEcoDepth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Aradhye2002","download_url":"https://codeload.github.com/Aradhye2002/EcoDepth/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224465744,"owners_count":17315867,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2024","deep-learning","depth-estimation","metric-depth-estimation","monocular-depth-estimation","stable-diffusion","zero-shot-transfer"],"created_at":"2024-08-02T22:00:53.531Z","updated_at":"2025-05-05T20:30:43.217Z","avatar_url":"https://github.com/Aradhye2002.png","language":"Jupyter Notebook","funding_links":[],"categories":["Papers"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eECoDepth v2.0.0\u003c/h1\u003e\n\n**CVPR 2024**  \n\u003ca href='https://ecodepth-iitd.github.io' style=\"margin-right: 20px;\"\u003e\u003cimg src='https://img.shields.io/badge/Project Page-ECoDepth-darkgreen' alt='Project Page'\u003e\u003c/a\u003e\n\u003ca href=\"https://arxiv.org/abs/2403.18807\" style=\"margin-right: 20px;\"\u003e\u003cimg src='https://img.shields.io/badge/Paper-arXiv-maroon' alt='arXiv page'\u003e\u003c/a\u003e\n\u003ca href=\"https://arxiv.org/abs/2403.18807\" style=\"margin-right: 20px;\"\u003e\u003cimg src='https://img.shields.io/badge/Paper-CvF-blue' alt='IEEE Xplore Paper'\u003e\u003c/a\u003e\n\u003ca href=\"https://arxiv.org/abs/2403.18807\" style=\"margin-right: 20px;\"\u003e\u003cimg src='https://img.shields.io/badge/Supplementary-CvF-blue' alt='IEEE Xplore Paper'\u003e\u003c/a\u003e\n\n[Suraj Patni](https://github.com/surajiitd)\\*,\n[Aradhye Agarwal](https://github.com/Aradhye2002)\\*,\n[Chetan Arora](https://www.cse.iitd.ac.in/~chetan)\u003cbr/\u003e\n\n\u003c/div\u003e\n\nWelcome to the **restructured codebase** for **ECoDepth**, our official implementation for monocular depth estimation (MDE) as presented in our CVPR 2024 paper. This repository has been significantly reorganized to improve usability, readability, and extensibility. \n\n\u003e **Important:** The original code used to generate the results in our paper is tagged as [v1.0.0](https://github.com/Aradhye2002/EcoDepth/releases/tag/v1.0.0), which you can download from the Releases section. For most practical purposes—such as training on custom datasets or performing inference—**we strongly recommend using the new [v2.0.0](https://github.com/Aradhye2002/EcoDepth/releases/tag/v2.0.0)** outlined here.\n\n\n## News\n- **[April 2024] Inference scripts for video or image to depth.**\n- [March 2024] Pretrained checkpoints for NYUv2 and KITTI datasets.\n- [March 2024] Training and Evaluation code released!\n- [Feb 2024] ECoDepth accepted in CVPR'2024.\n\n## Table of Contents\n1. [Overview of v2.0.0 Improvements](#overview-of-v200-improvements)\n2. [Setup](#setup)\n3. [Dataset Download (NYU Depth V2)](#dataset-download-nyu-depth-v2)\n4. [DepthDataset API](#depthdataset-api)\n5. [EcoDepth Model API](#ecodepth-model-api)\n6. [Training Workflow](#training-workflow)\n7. [Testing Workflow](#testing-workflow)\n8. [Inference Workflow](#inference-workflow)\n9. [Citation](#citation)\n\n\n\n## Overview of v2.0.0 Improvements\n\n1. **Integrated Model Downloading**  \n   - In the previous version (v1.0.0), you had to manually download our checkpoints from Google Drive and place them in the correct directory. Now, the model is automatically downloaded and cached on the first run in `EcoDepth/checkpoints`. Subsequent runs will use the cached checkpoints automatically.\n\n2. **Generic DepthDataset Module**  \n   - We provide a new, flexible `DepthDataset` module that loads any custom dataset for MDE training. This was a frequent feature request. Detailed usage is given in the [DepthDataset API](#depthdataset-api) section.\n\n3. **PyTorch Lightning Integration**  \n   - The `EcoDepth` model is now a subclass of `LightningModule`, allowing for streamlined training and inference workflows via PyTorch Lightning. This also makes it straightforward to export models to ONNX or TorchScript for production use.\n\n4. **Config-Based Workflows**  \n   - We replaced bash scripts with user-friendly JSON configs, making it clearer to specify training, testing, and inference parameters.\n\n5. **Reduced Dependencies \u0026 Simplified Setup**  \n   - We removed the requirement to install the entire Stable Diffusion pipeline and numerous large CLIP or VIT models separately. Our checkpoints already contain the necessary weights, so only **one** model download is required.  \n   - Dependencies like `mmcv`, which can be cumbersome to install, are no longer necessary. Installation is now simpler and more flexible.\n\n6. **Separate Workflows**  \n   - The code is structured into three main directories:  \n     - `train/` for training  \n     - `test/` for testing  \n     - `infer/` for inference  \n   - Each directory contains its own config files, making each workflow highly modular.\n\n\n\n## Setup\n\n1. **Install PyTorch (with or without GPU support)**  \n   - Refer to the [PyTorch installation guide](https://pytorch.org/get-started/previous-versions/) for commands tailored to your environment.  \n   - Example (with CUDA 12.4):\n     ```bash\n     conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia\n     ```\n\n2. **Install python3 Dependencies**  \n   - From the repository’s root directory, run:\n     ```bash\n     pip install -r requirements.txt\n     ```\n   - We have **not** pinned specific versions to reduce potential conflicts. Let the dependency resolver pick suitable versions for your system.\n\n3. **(Optional) Download NYU Depth V2 Dataset**  \n   - If you plan to train on the NYU Depth V2 dataset, simply run:\n     ```bash\n     bash download_nyu.sh\n     ```\n   - This downloads and unzips the dataset from [HF dataset `aradhye/nyu_depth_v2`](https://huggingface.co/datasets/aradhye/nyu_depth_v2) into a directory named `nyu_depth_v2` under `datasets`. The filenames are already provided as text files under `filenames/nyu_depth_v2`.\n\n\n\n## Dataset Download (NYU Depth V2)\n\nIf you want to replicate our NYU Depth V2 experiments:\n\n1. Run:\n   ```bash\n   bash download_nyu.sh\n   ```\n2. This script:\n   - Downloads NYU Depth V2 from [aradhye/nyu_depth_v2](https://huggingface.co/datasets/aradhye/nyu_depth_v2).\n   - Unzips the dataset in `datasets/nyu_depth_v2/`.\n   - Provides file lists in `filenames/nyu_depth_v2/`.\n\nYou can then set the corresponding paths in your JSON configs (see the [Training Workflow](#training-workflow) section).\n\n\n\n## DepthDataset API\n\n`DepthDataset` is a generic dataset class designed for pairs of RGB images and depth maps. It requires an `args` object (which can be a namespace or a dictionary) with the following attributes:\n\n- **`is_train` (bool)**  \n  Indicates whether the dataset is used for training (`True`) or evaluation/testing (`False`). Some augmentations (e.g., random cropping) are only applied in training mode.\n\n- **`filenames_path` (str)**  \n  Path to a text file containing pairs of image and depth map paths, separated by a space. \n\n- **`data_path` (str)**  \n  A directory path that is prepended to each filename from `filenames_path`. Thus, the actual file loaded is `data_path + path_in_filenames`.\n\n- **`depth_factor` (float)**  \n  Divides the raw depth values to convert them into meters. For NYU Depth V2, `depth_factor=1000.0`; for KITTI, `depth_factor=256.0`.\n\n- **`do_random_crop` (bool)**  \n  Whether to perform random cropping on the image/depth pairs (only if `is_train=True`). If `do_random_crop` is `True`, you must also set:\n  - **`crop_h` (int)**: Crop height  \n  - **`crop_w` (int)**: Crop width  \n\n  If images are smaller than `crop_h`×`crop_w`, zero padding is applied first.\n\n- **`use_cut_depth` (bool)**  \n  Whether to use [CutDepth](https://arxiv.org/abs/2107.07684) to reduce overfitting. We found it helpful for indoor datasets (e.g., NYU Depth V2) but not for outdoor datasets (e.g., KITTI). Only used during training.\n\n\n\n## EcoDepth Model API\n\n`EcoDepth` is implemented as a subclass of PyTorch Lightning’s `LightningModule`. The constructor expects an `args` object with these key attributes:\n\n- **`train_from_scratch` (bool)**  \n  Currently should always be `False`. To train from scratch, you would need the base pretrained weights. Typically, you will **finetune** using our published checkpoints.\n\n- **`eval_crop` (str)**  \n  Determines the evaluation cropping strategy. Possible values:\n  - `\"eigen\"`: Used for NYU  \n  - `\"garg\"`: Used for KITTI  \n  - `\"custom\"`: Implement your own function in `utils.py` and set `eval_crop=\"custom\"`.  \n  - `\"none\"`: No cropping\n\n- **`no_of_classes` (int)**  \n  Number of scene classes for internal embeddings. For NYU (indoor model) use `100`; for VKITTI (outdoor model) use `200`.\n\n- **`max_depth` (float)**  \n  Maximum depth value the model will predict. Typically:\n  - `10.0` for indoor (NYU)  \n  - `80.0` for outdoor (KITTI)\n\n\n\n## Training Workflow\n\n1. **Navigate to `train/`**  \n2. **Edit `train_config.json`**  \n   - **Data arguments**:  \n     - `train_filenames_path`, `train_data_path`, `train_depth_factor`  \n     - `test_filenames_path`, `test_data_path`, `test_depth_factor`  \n   - **Model arguments**:  \n     - `eval_crop`, `no_of_classes`, `max_depth`  \n   - **Training arguments**:  \n     - `ckpt_path`: Path to a Lightning checkpoint (for finetuning/resuming). If this is an empty string, you must specify `scene=\"indoor\"` or `\"outdoor\"`, triggering automatic model download.  \n     - `epochs`: Total training epochs  \n     - `weight_decay`, `lr`: Optimizer hyperparameters  \n     - `val_check_interval`: Validation frequency (in training steps)  \n\n3. **Run Training**  \n   ```bash\n   python3 train.py\n   ```\n   PyTorch Lightning will handle checkpointing automatically (by default, in `train/lightning_logs/`).\n\n\n\n## Testing Workflow\n\n1. **Navigate to `test/`**  \n2. **Edit `test_config.json`**  \n   - Similar to training config, but no training arguments.  \n   - Point `ckpt_path` to the checkpoint you want to evaluate, or leave empty if you want to use the provided models.  \n3. **Run Testing**  \n   ```bash\n   python3 test.py\n   ```\n   This script reports evaluation metrics (e.g., RMSE, MAE, δ thresholds).\n\n\n\n## Inference Workflow\n\nThere are two scripts provided in the `infer/` directory: one for images (`infer_image.py`) and one for videos (`infer_video.py`).\n\n### Image Inference\n\n1. **Navigate to `infer/`**  \n2. **Edit `image_config.json`**  \n   - **Key arguments**:  \n     - `image_path`: Path to a single image or a directory containing multiple images (recursively processed).  \n     - `outdir`: Output directory for predicted depth maps.  \n     - `resolution`: Scale factor for processing images (higher resolution =\u003e more GPU memory usage).  \n     - `flip_test` (bool): Whether to perform horizontal flip as a test-time augmentation.  \n     - `grayscale` (bool): Output in grayscale (if `false`, uses a colorized depth map).  \n     - `pred_only` (bool): Whether to output **only** the depth map.\n\n3. **Run Image Inference**  \n   ```bash\n   python3 infer_image.py\n   ```\n   Results are written to `outdir`, preserving subdirectory structure relative to `image_path`.\n\n### Video Inference\n\n1. **Edit `video_config.json`**  \n   - **Key arguments**:  \n     - `video_path`: Path to the video file.  \n     - `outdir`: Output directory for frames or depth predictions.  \n     - `vmax`: Depth values are clipped to this maximum.  \n2. **Run Video Inference**  \n   ```bash\n   python3 infer_video.py\n   ```\n\n\n\n## Citation\n\nIf you find **ECoDepth** helpful in your research or work, please cite our CVPR 2024 paper:\n\n```\n@InProceedings{Patni_2024_CVPR,\n  author    = {Patni, Suraj and Agarwal, Aradhye and Arora, Chetan},\n  title     = {ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation},\n  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n  month     = {June},\n  year      = {2024},\n  pages     = {28285-28295}\n}\n```\n\n\n\n**Thank you for using ECoDepth!**  \nFor any questions or suggestions, feel free to open an issue. We hope this restructured codebase helps you train on custom datasets and perform fast, efficient inference.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faradhye2002%2Fecodepth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faradhye2002%2Fecodepth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faradhye2002%2Fecodepth/lists"}