{"id":13427653,"url":"https://github.com/yzcjtr/GeoNet","last_synced_at":"2025-03-16T00:31:51.736Z","repository":{"id":44763102,"uuid":"124822751","full_name":"yzcjtr/GeoNet","owner":"yzcjtr","description":"Code for GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)","archived":false,"fork":false,"pushed_at":"2019-02-19T06:32:50.000Z","size":315,"stargazers_count":717,"open_issues_count":7,"forks_count":183,"subscribers_count":34,"default_branch":"master","last_synced_at":"2024-08-01T01:27:41.894Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yzcjtr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-12T02:27:49.000Z","updated_at":"2024-07-04T06:18:21.000Z","dependencies_parsed_at":"2022-08-29T22:00:44.721Z","dependency_job_id":null,"html_url":"https://github.com/yzcjtr/GeoNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzcjtr%2FGeoNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzcjtr%2FGeoNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzcjtr%2FGeoNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzcjtr%2FGeoNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yzcjtr","download_url":"https://codeload.github.com/yzcjtr/GeoNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221631805,"owners_count":16855012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T01:00:35.991Z","updated_at":"2024-10-27T05:30:17.998Z","avatar_url":"https://github.com/yzcjtr.png","language":"Python","readme":"# GeoNet\n\nThis is a Tensorflow implementation of our paper:\n\nGeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)\n\nZhichao Yin and Jianping Shi\n\narxiv preprint: (https://arxiv.org/abs/1803.02276)\n\n\u003cimg src=\"misc/overview.jpg\" width=\"550\"\u003e\n\n## Requirements\n\nThis code has been tested with Python 2.7, TensorFlow 1.1 and CUDA 8.0 on Ubuntu 16.04.\n\n## Data preparation\n\nFor replicating our results in all of the three tasks (monocular depth, camera pose and optical flow), \nyou need to download the following datasets, and preprocess them into certain formats:\n\n### [KITTI](http://www.cvlibs.net/datasets/kitti/index.php)\nFor **depth** and **flow** tasks, the training data is [KITTI raw dataset](http://www.cvlibs.net/datasets/kitti/raw_data.php) \nand you can download them by the [official script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip);\n\nFor **pose** task, the training data is [KITTI odometry dataset](http://www.cvlibs.net/download.php?file=data_odometry_color.zip) \nand you should download the calibration files as well as ground truth poses (for evaluation).\n\nAfter downloaded the data, you can run the following command for preprocessing:\n```bash\npython data/prepare_train_data.py --dataset_dir=/path/to/kitti/dataset/ --dataset_name=kitti_split --dump_root=/path/to/formatted/data/ --seq_length=3 --img_height=128 --img_width=416 --num_threads=16 --remove_static\n```\n\nFor **depth** task, the `--dataset_name` should be `kitti_raw_eigen` and `--seq_length` is set to `3`;\n\nFor **flow** task, the `--dataset_name` should be `kitti_raw_stereo` and `--seq_length` is set to `3`;\n\nFor **pose** task, the `--dataset_name` should be `kitti_odom` and `--seq_length` is set to `5`.\n\n### [Cityscapes](https://www.cityscapes-dataset.com/)\nYou can optionally pretrain the model on Cityscapes dataset for any of the three tasks. The required training \ndata is image sequence `leftImg8bit_sequence_trainvaltest.zip` and calibration file `camera_trainvaltest.zip`. \nAfter downloaded them, simply run:\n```bash\npython data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/formatted/data/ --seq_length=3 --img_height=171 --img_width=416 --num_threads=16\n```\n\n## Training\nOur code supports two training modes, corresponding to our stage-wise training strategy. \nThe `train_rigid` mode is mainly for learning depth and pose, while `train_flow` mode supports direct or residual flow learning.\n\nFor ``train_rigid`` mode (**depth** and **pose** tasks), run the command\n```bash\npython geonet_main.py --mode=train_rigid --dataset_dir=/path/to/formatted/data/ --checkpoint_dir=/path/to/save/ckpts/ --learning_rate=0.0002 --seq_length=3 --batch_size=4 --max_steps=350000 \n```\nYou can switch the network encoder by setting `--dispnet_encoder` flag, or perform depth scale normalization (see [this paper](https://arxiv.org/abs/1712.00175) for details) by setting `--scale_normalize` as True.\nNote that for replicating depth and pose results, the `--seq_length` is suggested to be 3 and 5 respectively.\n\nFor ``train_flow`` mode (**flow** task), run the command\n```bash\npython geonet_main.py --mode=train_flow --dataset_dir=/path/to/formatted/data/ --checkpoint_dir=/path/to/save/ckpts/ --learning_rate=0.0002 --seq_length=3 --flownet_type=direct --max_steps=400000\n```\nYou can choose to learn direct or residual flow by setting `--flownet_type` flag. **Note** that when the `--flownet_type` is `residual`, the `--init_ckpt_file` should be specified to point\nat a model pretrained on the same dataset with mode of `train_rigid`. Also a `max_steps` more than 200 epochs is preferred for learning residual flow.\n\n### Pretrained models\nYou can download our pretrained models as well as their predictions in all of the three tasks from [[Google Drive](https://drive.google.com/open?id=1VSGpdMrQ3dFKdher_2RteDfz7F0g57ZH)]. **Note** that they were trained according to **different splits** of KITTI as described in the paper. Following the testing and evaluation instructions below, you should obtain similar results as reported in the paper.\n\n#### Notes about depth scale normalization\nFollowing most of the original hyperparameters but setting `--scale_normalize` as True, we have trained our DepthNet better on the Eigen's split of KITTI. The pretrained model is also provided, namely **model_sn** in **geonet_depthnet** subfolder. Note this is not included in our paper, but the performance is further improved:\n\n| Abs Rel | Sq Rel | RMSE  | RMSE(log) | Acc.1 | Acc.2 | Acc.3 |\n|---------|--------|-------|-----------|-------|-------|-------|\n| 0.149   | 1.060  | 5.567 | 0.226     | 0.796 | 0.935 | 0.975 |\n\n## Testing\nWe provide testing and evaluation scripts for all of the three tasks.\n\n### Monocular Depth\nRun the following command\n```bash\npython geonet_main.py --mode=test_depth --dataset_dir=/path/to/kitti/raw/dataset/ --init_ckpt_file=/path/to/trained/model/ --batch_size=1 --depth_test_split=eigen --output_dir=/path/to/save/predictions/\n```\nThen you can evaluate the prediction by running\n```bash\npython kitti_eval/eval_depth.py --split=eigen --kitti_dir=/path/to/kitti/raw/dataset/ --pred_file=/path/to/predictions/\n```\n\n### Camera Pose\nFirstly assuming you have downloaded the KITTI odometry dataset (including groundtruth poses), run\n```bash\npython geonet_main.py --mode=test_pose --dataset_dir=/path/to/kitti/odom/dataset/ --init_ckpt_file=/path/to/trained/model/ --batch_size=1 --seq_length=5 --pose_test_seq=9 --output_dir=/path/to/save/predictions/\n```\nNow you have predicted pose snippets. You can **generate the groundtruth pose snippets** by running\n```bash\npython kitti_eval/generate_pose_snippets.py --dataset_dir=/path/to/kitti/odom/dataset/ --output_dir=/path/to/save/gtruth/pose/snippets/ --seq_id=09 --seq_length=5\n```\nThen you can evaluate your predictions by\n```bash\npython kitti_eval/eval_pose.py --gtruth_dir=/path/to/gtruth/pose/snippets/ --pred_dir=/path/to/predicted/pose/snippets/\n```\n\n### Optical Flow\nFirstly you need to download the [KITTI flow 2015 dataset](http://www.cvlibs.net/download.php?file=data_scene_flow.zip) and its [multi-view extension](http://www.cvlibs.net/download.php?file=data_scene_flow_multiview.zip). \nFor replicating our flow results in the paper, a `seq_length` of 3 is recommended. You need to format the testing data by running\n```bash\npython kitti_eval/generate_multiview_extension.py --dataset_dir=/path/to/data_scene_flow_multiview/ --calib_dir=/path/to/data_scene_flow_calib/ --dump_root=/path/to/formatted/testdata/ --cam_id=02 --seq_length=3\n```\nThen you can test your trained model by\n```bash\npython geonet_main.py --mode=test_flow --dataset_dir=/path/to/formatted/testdata/ --init_ckpt_file=/path/to/trained/model/ --flownet_type=direct --batch_size=1 --output_dir=/path/to/save/predictions/\n```\nWe again provide evaluation script:\n```bash\npython kitti_eval/eval_flow.py --dataset_dir=/path/to/kitti_stereo_2015/ --pred_dir=/path/to/predictions/\n```\n\n## Acknowledgements\nWe thank [Tinghui Zhou](https://github.com/tinghuiz/SfMLearner) and [Clément Godard](https://github.com/mrharicot/monodepth) for their great works and repos.\n\n## Reference\nIf you find our work useful in your research please consider citing our paper:\n```\n@inproceedings{yin2018geonet,\n  title     = {GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose},\n  author    = {Yin, Zhichao and Shi, Jianping},\n  booktitle = {CVPR},\n  year = {2018}\n}\n```\n","funding_links":[],"categories":["monocular depth based on video(sfm)","3DVision","2. 单目深度估计(半监督、无监督)","2. Monocular Depth (Semi- / Un-Supervised)"],"sub_categories":["Depth/StereoMatching","2.2 Multi View"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyzcjtr%2FGeoNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyzcjtr%2FGeoNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyzcjtr%2FGeoNet/lists"}