{"id":20516377,"url":"https://github.com/nianticlabs/wavelet-monodepth","last_synced_at":"2025-09-04T09:39:48.501Z","repository":{"id":37530343,"uuid":"364686473","full_name":"nianticlabs/wavelet-monodepth","owner":"nianticlabs","description":"[CVPR 2021] Monocular depth estimation using wavelets for efficiency","archived":false,"fork":false,"pushed_at":"2021-12-31T01:45:25.000Z","size":9496,"stargazers_count":230,"open_issues_count":1,"forks_count":34,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-03-27T01:09:39.644Z","etag":null,"topics":["computer-vision","cvpr2021","depth-estimation","kitti-dataset","nyu-depth-v2","wavelets"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nianticlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-05-05T19:34:49.000Z","updated_at":"2025-02-24T07:37:32.000Z","dependencies_parsed_at":"2022-08-20T09:22:16.683Z","dependency_job_id":null,"html_url":"https://github.com/nianticlabs/wavelet-monodepth","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fwavelet-monodepth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fwavelet-monodepth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fwavelet-monodepth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fwavelet-monodepth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nianticlabs","download_url":"https://codeload.github.com/nianticlabs/wavelet-monodepth/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248691532,"owners_count":21146381,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cvpr2021","depth-estimation","kitti-dataset","nyu-depth-v2","wavelets"],"created_at":"2024-11-15T21:28:34.010Z","updated_at":"2025-04-13T09:37:03.010Z","avatar_url":"https://github.com/nianticlabs.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Single Image Depth Prediction with Wavelet Decomposition\n\n\n**[Michaël Ramamonjisoa](https://michaelramamonjisoa.github.io), \n[Michael Firman](http://www.michaelfirman.co.uk), \n[Jamie Watson](https://scholar.google.com/citations?view_op=list_works\u0026hl=en\u0026user=5pC7fw8AAAAJ), \n[Vincent Lepetit](http://imagine.enpc.fr/~lepetitv/) and \n[Daniyar Turmukhambetov](http://dantkz.github.io/)**\n\n***CVPR 2021***\n\n[[Link to paper]](http://arxiv.org/abs/2106.02022)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/combo_kitti.gif\" alt=\"kitti gif\" width=\"500\" /\u003e\n  \u003cimg src=\"assets/combo_nyu.gif\" alt=\"nyu gif\" width=\"300\" /\u003e\n\u003c/p\u003e\n\n**We introduce *WaveletMonoDepth*, which improves efficiency of standard encoder-decoder monocular depth estimation methods\nby exploiting wavelet decomposition.**\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca\nhref=\"https://storage.googleapis.com/niantic-lon-static/research/wavelet-monodepth/5min.mp4\"\u003e\n  \u003cimg src=\"assets/video_thumbnail.png\" alt=\"5 minute CVPR presentation video link\" width=\"400\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n## 🧑‍🏫 Methodology \n\n**WaveletMonoDepth** was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon \na baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a \nwavelet prediction.\n\nWavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of \nunnecessary computations.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/architecture.png\" alt=\"our architecture\" width=\"700\" /\u003e\n\u003c/p\u003e\n\nThe network is first trained with a **dense** convolutions in the decoder until convergence, and the dense convolutions \nare then replaced with **sparse** ones. \n\nThis is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.\n\n## 🗂 Environment Requirements 🗂 ##\n\nWe recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:\n\n```\nconda env create -f environment.yml\nconda activate wavelet-mdp\n```\n\nOur work uses [Pytorch Wavelets](https://github.com/fbcotter/pytorch_wavelets), a great package from Fergal Cotter \nwhich implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! \nTo install Pytorch Wavelets, simply run:\n```\ngit clone https://github.com/fbcotter/pytorch_wavelets\ncd pytorch_wavelets\npip install .\n```\n\n## 🚗🚦 KITTI 🌳🛣  \n[Depth Hints](https://github.com/nianticlabs/depth-hints) was used as a baseline for KITTI.\n\n***Depth Hints*** builds upon [monodepth2](https://github.com/nianticlabs/monodepth2). If you have questions about running the code, please see the issues in their repositories first.\n\n### ⚙ Setup, Training and Evaluation \nPlease see the [KITTI](./KITTI/README.md) directory of this repository for details on how to train and evaluate our method. \n\n### 📊 Results 📦 Trained models\n\nPlease find below the scores using **dense** convolutions to predict wavelet coefficients. Download links coming soon!\n\n| Model name | Training modality | Resolution | abs_rel | RMSE | δ\u003c1.25 |\n| ---------- | ---------- | ---------- | ----- | ------ | ----- |\n| [`Ours Resnet18`](https://drive.google.com/file/d/1uDJoikUBiDZZOLDMDNsn_eAXByL5g9mi/view?usp=sharing) | Stereo + DepthHints | 640 x 192 | 0.106 | 4.693 | 0.876 |\n| [`Ours Resnet50`](https://drive.google.com/file/d/1UykLnyAlWjqVYWQ5wGK2I1uonYvdJ-2F/view?usp=sharing) | Stereo + DepthHints | 640 x 192 | 0.105 | 4.625 | 0.879 |\n| [`Ours Resnet18`](https://drive.google.com/file/d/1wyXNOgaboQI1s2EwJIuE2APWPVLJwKWM/view?usp=sharing) | Stereo + DepthHints | 1024 x 320 | 0.102 | 4.452 | 0.890 |\n| [`Ours Resnet50`](https://drive.google.com/file/d/1fVkPEv71b-3RBr_n52WPcj3wd-UKVrkF/view?usp=sharing) | Stereo + DepthHints | 1024 x 320 | 0.097 | 4.387 | 0.891 |\n\n### 🎚 Playing with sparsity\n\nHowever the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients\nto trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:\n- low thresholds values will lead to high performance but high number of computations,\n- high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/kitti_sparsify.gif\" alt=\"sparsify kitti\" width=\"500\" /\u003e\n\u003c/p\u003e\n\nComputing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/relative_score_loss_kitti.png\" alt=\"scores kitti\" width=\"500\" /\u003e\n\u003c/p\u003e\n\nOur wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in \nperformance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/score_vs_flops.png\" alt=\"scores vs flops kitti\" width=\"500\" /\u003e\n\u003c/p\u003e\n\n## 🪑🛁 NYUv2 🛋🚪\n[Dense Depth](https://github.com/ialhashim/DenseDepth) was used as a baseline for NYUv2.\nNote that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we \nmade a few different modifications:\n\n- we supervise depth directly instead of supervising disparity\n- we do not use SSIM\n- we use DenseNet161 as encoder instead of DenseNet169\n\n### ⚙ Setup, Training and Evaluation \nPlease see the [NYUv2](./NYUv2/README.md) directory of this repository for details on how to train and evaluate our method.\n\n### 📊 Results and 📦 Trained models\n\nPlease find below the scores and associated trained models, using **dense** convolutions to predict wavelet \ncoefficients.\n\n\n| Model name | Encoder | Resolution | abs_rel | RMSE | δ\u003c1.25 | ε_acc |\n| ---------- | ---------- | ---------- | ---------- | ----- | ----- | ----- |\n| [`Baseline`](https://drive.google.com/file/d/1WmGBXBwbR8jh8H_F7TK2LuUNJ1T_wfcQ/view?usp=sharing) | DenseNet | 640 x 480 | 0.1277 | 0.5479 | 0.8430 | 1.7170 |\n| [`Ours`](https://drive.google.com/file/d/1LubjqXEzAd2SI6Zwse6VFvHoobTr4P8Z/view?usp=sharing) | DenseNet | 640 x 480 | 0.1258 | 0.5515 | 0.8451 | 1.8070 |\n| [`Baseline`](https://drive.google.com/file/d/18BU-4u_9NWm67NCLk1On5IA0lJor6DHY/view?usp=sharing) | MobileNetv2 | 640 x 480 | 0.1772 | 0.6638 | 0.7419 | 1.8911 |\n| [`Ours`](https://drive.google.com/file/d/1-dcOO0T_YlFATwZBTg5ejg5evtR319Zi/view?usp=sharing) | MobileNetv2 | 640 x 480 | 0.1727 | 0.6776 | 0.7380 | 1.9732 |\n\n### 🎚 Playing with sparsity\n\nAs with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on \nperformance.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/nyu_sparsify.gif\" alt=\"sparsify nyu\" width=\"500\" /\u003e\n\u003c/p\u003e\n\nComputing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than \n0.15%. \n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/relative_score_loss_nyu.png\" alt=\"scores nyu\" width=\"500\" /\u003e\n\u003c/p\u003e\n\n## 🎮 Try it yourself!\n\nTry using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the \nresulting computational saving in FLOPs. Notebooks can be found in `\u003cDATASET\u003e/sparsity_test_notebook.ipynb` where \n`\u003cDATASET\u003e` is either KITTI or NYUv2. \n\n## ✏️ 📄 Citation\n\nIf you find our work useful or interesting, please consider citing [our paper](http://arxiv.org/abs/2106.02022/):\n\n```\n@inproceedings{ramamonjisoa-2021-wavelet-monodepth,\n  title     = {Single Image Depth Prediction with Wavelet Decomposition},\n  author    = {Ramamonjisoa, Micha{\\\"{e}}l and\n               Michael Firman and\n               Jamie Watson and\n               Vincent Lepetit and\n               Daniyar Turmukhambetov},\n  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  month = {June},\n  year = {2021}\n}\n```\n\n\n## 👩‍⚖️ License\nCopyright © Niantic, Inc. 2021. Patent Pending.\nAll rights reserved.\nPlease see the [license file](LICENSE) for terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fwavelet-monodepth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnianticlabs%2Fwavelet-monodepth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fwavelet-monodepth/lists"}