{"id":15026858,"url":"https://github.com/tinghuiz/sfmlearner","last_synced_at":"2025-05-15T17:05:15.627Z","repository":{"id":37663700,"uuid":"89383667","full_name":"tinghuiz/SfMLearner","owner":"tinghuiz","description":"An unsupervised learning framework for depth and ego-motion estimation from monocular videos","archived":false,"fork":false,"pushed_at":"2021-10-26T05:58:09.000Z","size":7122,"stargazers_count":1993,"open_issues_count":48,"forks_count":558,"subscribers_count":76,"default_branch":"master","last_synced_at":"2025-05-15T17:05:06.298Z","etag":null,"topics":["deep-learning","depth-prediction","self-supervised-learning","unsupervised-learning","visual-odometry"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinghuiz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-04-25T16:38:21.000Z","updated_at":"2025-05-09T11:17:16.000Z","dependencies_parsed_at":"2022-07-12T16:42:33.796Z","dependency_job_id":null,"html_url":"https://github.com/tinghuiz/SfMLearner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinghuiz%2FSfMLearner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinghuiz%2FSfMLearner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinghuiz%2FSfMLearner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinghuiz%2FSfMLearner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinghuiz","download_url":"https://codeload.github.com/tinghuiz/SfMLearner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254384988,"owners_count":22062422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","depth-prediction","self-supervised-learning","unsupervised-learning","visual-odometry"],"created_at":"2024-09-24T20:05:15.834Z","updated_at":"2025-05-15T17:05:10.615Z","avatar_url":"https://github.com/tinghuiz.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SfMLearner\nThis codebase implements the system described in the paper:\n\nUnsupervised Learning of Depth and Ego-Motion from Video\n\n[Tinghui Zhou](https://people.eecs.berkeley.edu/~tinghuiz/), [Matthew Brown](http://matthewalunbrown.com/research/research.html), [Noah Snavely](http://www.cs.cornell.edu/~snavely/), [David G. Lowe](http://www.cs.ubc.ca/~lowe/home.html)\n\nIn CVPR 2017 (**Oral**).\n\nSee the [project webpage](https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner/) for more details. Please contact Tinghui Zhou (tinghuiz@berkeley.edu) if you have any questions.\n\n\u003cimg src='misc/cityscapes_sample_results.gif' width=320\u003e\n\n## Prerequisites\nThis codebase was developed and tested with Tensorflow 1.0, CUDA 8.0 and Ubuntu 16.04.\n\n## Running the single-view depth demo\nWe provide the demo code for running our single-view depth prediction model. First, download the pre-trained model from this [Google Drive](https://drive.google.com/file/d/1AH5LV29Fijrz_QI3Th6ogtXJKXhd8Nm9/view?usp=sharing), and put the model files under `models/`. Then you can use the provided ipython-notebook `demo.ipynb` to run the demo.\n\n## Preparing training data\nIn order to train the model using the provided code, the data needs to be formatted in a certain manner. \n\nFor [KITTI](http://www.cvlibs.net/datasets/kitti/raw_data.php), first download the dataset using this [script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip) provided on the official website, and then run the following command\n```bash\npython data/prepare_train_data.py --dataset_dir=/path/to/raw/kitti/dataset/ --dataset_name='kitti_raw_eigen' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=128 --num_threads=4\n```\nFor the pose experiments, we used the KITTI odometry split, which can be downloaded [here](http://www.cvlibs.net/datasets/kitti/eval_odometry.php). Then you can change `--dataset_name` option to `kitti_odom` when preparing the data.\n\nFor [Cityscapes](https://www.cityscapes-dataset.com/), download the following packages: 1) `leftImg8bit_sequence_trainvaltest.zip`, 2) `camera_trainvaltest.zip`. Then run the following command\n```bash\npython data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=171 --num_threads=4\n```\nNotice that for Cityscapes the `img_height` is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.\n\n## Training\nOnce the data are formatted following the above instructions, you should be able to train the model by running the following command\n```bash\npython train.py --dataset_dir=/path/to/the/formatted/data/ --checkpoint_dir=/where/to/store/checkpoints/ --img_width=416 --img_height=128 --batch_size=4\n```\nYou can then start a `tensorboard` session by\n```bash\ntensorboard --logdir=/path/to/tensorflow/log/files --port=8888\n```\nand visualize the training progress by opening [https://localhost:8888](https://localhost:8888) on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~100K iterations when training on KITTI. \n\n### Notes\nAfter adding data augmentation and removing batch normalization (along with some other minor tweaks), we have been able to train depth models better than what was originally reported in the paper even without using additional Cityscapes data or the explainability regularization. The provided pre-trained model was trained on KITTI only with smooth weight set to 0.5, and achieved the following performance on the Eigen test split (Table 1 of the paper):\n\n| Abs Rel | Sq Rel | RMSE  | RMSE(log) | Acc.1 | Acc.2 | Acc.3 |\n|---------|--------|-------|-----------|-------|-------|-------|\n| 0.183   | 1.595  | 6.709 | 0.270     | 0.734 | 0.902 | 0.959 | \n\nWhen trained on 5-frame snippets, the pose model obtains the following performanace on the KITTI odometry split (Table 3 of the paper):\n\n| Seq. 09            | Seq. 10            |\n|--------------------|--------------------|\n| 0.016 (std. 0.009) | 0.013 (std. 0.009) |\n\n## Evaluation on KITTI\n\n### Depth\nWe provide evaluation code for the single-view depth experiment on KITTI. First, download our predictions (~140MB) from this [Google Drive](https://drive.google.com/file/d/1ERB2vUH_6to8NI9KN-ug-ijcWaQf9SIp/view?usp=sharing) and put them into `kitti_eval/`.\n\nThen run\n```bash\npython kitti_eval/eval_depth.py --kitti_dir=/path/to/raw/kitti/dataset/ --pred_file=kitti_eval/kitti_eigen_depth_predictions.npy\n```\nIf everything runs properly, you should get the numbers for `Ours(CS+K)` in Table 1 of the paper. To get the numbers for `Ours cap 50m (CS+K)`, set an additional flag `--max_depth=50` when executing the above command.\n\n### Pose\nWe provide evaluation code for the pose estimation experiment on KITTI. First, download the predictions and ground-truth pose data from this [Google Drive](https://drive.google.com/file/d/1BqTIY_PBRkFvKrFvqlhPsEaouSdo42ZZ/view?usp=sharing).\n\nNotice that all the predictions and ground-truth are 5-frame snippets with the format of `timestamp tx ty tz qx qy qz qw` consistent with the [TUM evaluation toolkit](https://vision.in.tum.de/data/datasets/rgbd-dataset/tools#evaluation). Then you could run \n```bash\npython kitti_eval/eval_pose.py --gtruth_dir=/directory/of/groundtruth/trajectory/files/ --pred_dir=/directory/of/predicted/trajectory/files/\n```\nto obtain the results reported in Table 3 of the paper. For instance, to get the results of `Ours` for `Seq. 10` you could run\n```bash\npython kitti_eval/eval_pose.py --gtruth_dir=kitti_eval/pose_data/ground_truth/10/ --pred_dir=kitti_eval/pose_data/ours_results/10/\n```\n\n## KITTI Testing code\n\n### Depth\nOnce you have model trained, you can obtain the single-view depth predictions on the KITTI eigen test split formatted properly for evaluation by running\n```bash\npython test_kitti_depth.py --dataset_dir /path/to/raw/kitti/dataset/ --output_dir /path/to/output/directory --ckpt_file /path/to/pre-trained/model/file/\n```\n\n### Pose\nWe also provide sample testing code for obtaining pose predictions on the KITTI dataset with a pre-trained model. You can obtain the predictions formatted as above for pose evaluation by running\n```bash\npython test_kitti_pose.py --test_seq [sequence_id] --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file /path/to/pre-trained/model/file/\n```\nA sample model trained on 5-frame snippets can be downloaded at this [Google Drive](https://drive.google.com/file/d/1vMg9UbK4kQSvFtJrzv0lfCVAoTN-we1y/view?usp=sharing). \n\nThen you can obtain predictions on, say `Seq. 9`, by running\n```bash\npython test_kitti_pose.py --test_seq 9 --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file models/model-100280\n```\n\n## Other implementations\n[Pytorch](https://github.com/ClementPinard/SfmLearner-Pytorch) (by Clement Pinard)\n\n## Disclaimer\nThis is the authors' implementation of the system described in the paper and not an official Google product.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinghuiz%2Fsfmlearner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinghuiz%2Fsfmlearner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinghuiz%2Fsfmlearner/lists"}