{"id":21280492,"url":"https://github.com/uehwan/simvodis","last_synced_at":"2025-07-11T10:31:35.727Z","repository":{"id":117488698,"uuid":"222082030","full_name":"Uehwan/SimVODIS","owner":"Uehwan","description":"Simultaneous Visual Odometry, Object Detection, and Instance Segmentation","archived":false,"fork":false,"pushed_at":"2021-03-21T07:06:36.000Z","size":9351,"stargazers_count":151,"open_issues_count":5,"forks_count":35,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-06T02:33:39.603Z","etag":null,"topics":["data-driven-visual-odometry","monocular-depth-estimation","monocular-visual-odometry","robot-vision","robotics","semantic-slam","semantic-vo","visual-odometry","visual-slam"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Uehwan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-16T10:23:05.000Z","updated_at":"2025-01-22T03:01:38.000Z","dependencies_parsed_at":"2023-03-13T12:49:05.493Z","dependency_job_id":null,"html_url":"https://github.com/Uehwan/SimVODIS","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Uehwan/SimVODIS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Uehwan%2FSimVODIS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Uehwan%2FSimVODIS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Uehwan%2FSimVODIS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Uehwan%2FSimVODIS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Uehwan","download_url":"https://codeload.github.com/Uehwan/SimVODIS/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Uehwan%2FSimVODIS/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264785581,"owners_count":23663802,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-driven-visual-odometry","monocular-depth-estimation","monocular-visual-odometry","robot-vision","robotics","semantic-slam","semantic-vo","visual-odometry","visual-slam"],"created_at":"2024-11-21T10:30:40.285Z","updated_at":"2025-07-11T10:31:35.719Z","avatar_url":"https://github.com/Uehwan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SimVODIS\n[Simultaneous Visual Odometry, Object Detection, and Instance Segmentation.](https://arxiv.org/abs/1911.05939)\nSimVODIS extracts both semantic and physical attributes from a sequence of image frames. SimVODIS evaluates the relative pose between frames, while detecting objects and segementing the object boundaries. During the process, depth can be optionally estimated.\n\n![Comparison](./figures/comparison_people.png)\nYou can download the predicted results for the Eigen split from [here](https://drive.google.com/file/d/1K0DBtcL38TtnB29RdcOr3eOG4QAeHfS5/view?usp=sharing).\n\n## Getting Started\n\nThese instructions will get you a copy of the project up and running on your local machine for development and testing purposes.\n\n### Requirements\n\n* Ubuntu 16.04+\n* CUDA \u003e= 9.0\n* Python 3.6+\n* [Pytorch 1.0.0 from a nightly release](https://pytorch.org/get-started/previous-versions/)\n* [MaskRCNN (included in this project)](https://github.com/facebookresearch/maskrcnn-benchmark)\n* GCC \u003e= 4.9\n\n### Installation\n\nWe tested the code in the following environments: 1) CUDA 9.0 on Ubuntu 16.04 and 2) CUDA 10.1 on Ubuntu 18.04. SimVODIS may work in other environments, but you might need to modify a part of the code. We recommend you using Anaconda for the environment setup.\n\n```bash\nconda create --name SimVODIS python=3.6.7\nconda activate SimVODIS\nconda install ipython\npip install ninja yacs cython matplotlib tqdm opencv-python\n# conda install -c pytorch pytorch-nightly=1.0 torchvision=0.2.2 cudatoolkit=10.0\nconda install -c pytorch pytorch-nightly=1.0 torchvision cudatoolkit=9.0\n\n# install SimVODIS\ngit clone https://github.com/Uehwan/SimVODIS.git\ncd SimVODIS\n# the following will install the lib with symbolic links,\n# so that you can modify the files if you want and won't need to re-build it\npython setup.py build develop\n\npip install tensorboardX\nconda install -c anaconda path.py scipy=1.2\n```\n\n### Pretrained Mask-RCNN model\n\nDownload the following pretrained Mask-RCNN model and place it under the root directory.\n- [R-50-FPN](https://download.pytorch.org/models/maskrcnn/e2e_mask_rcnn_R_50_FPN_1x.pth)\n\nFor more detailed information on the Mask-RCNN models, refer to the [Facebook Mask-RCNN benchmark repo](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md)\n\n\n## Data Preparation\n\nFor [KITTI](http://www.cvlibs.net/datasets/kitti/raw_data.php), first download the dataset using this [script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip) provided on the official website of KITTI. Placing the dataset on SSD would increase the training speed.\n\nYou can also download Malaga, ScanNet, NYU Depth, RGB-D SLAM, Make3D and 7 Scenes datasets.\n- Malaga: Download from [the official web](https://www.mrpt.org/MalagaUrbanDataset)\n- ScanNet Request access from [the official repository](https://github.com/ScanNet/ScanNet)\n- NYU Depth: Download from [the official web](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)\n- RGB-D SLAM: Download from [the official web](https://vision.in.tum.de/data/datasets/rgbd-dataset/download)\n- Make3D: Download from [the official web](http://make3d.cs.cornell.edu/)\n- 7 Scenes: Download from [the official web](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/)\n\n## Training\nThe following trains the SimVODIS_k model described in the paper.\n```bash\npython train.py \\\n    --data_path PATH/TO/DATASET \\\n    --split eigen_zhou \\\n    --model_name simvodis_k \\\n    --log_dir PATH/TO/LOG/DIR \\\n```\n\nTo use other datasets for training, use the following.\n```bash\npython train.py \\\n    --data_path PATH/TO/DATASET \\\n    --split custom \\\n    --model_name simvodis_a \\\n    --dataset mixed \\\n    --log_dir PATH/TO/LOG/DIR \\\n```\n\nAfter starting the training script, you can check the training process with the following.\n```bash\ntensorboard --logdir=PATH/TO/LOG/DIR\n```\n\n## Joint Training on MS COCO and KITTI\nFirst, you need to install pycoco-tolls as follows.\n```bash\n# install pycocotools\npip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'\n\n# install apex\ncd $INSTALL_DIR\ngit clone https://github.com/NVIDIA/apex.git\ncd apex\npython setup.py install --cuda_ext --cpp_ext\n```\n\nNext, download MS COCO 2014 from [the official website](http://cocodataset.org/#download), extra annotation from [here](http://datasets.d2.mpi-inf.mpg.de/hosang17cvpr/coco_minival2014.tar.gz) and make symbolic links as follows.\n\n```bash\ncd /path_to_SimVODIS_directory\nmkdir -p datasets/coco\nln -s /path_to_coco_dataset/annotations datasets/coco/annotations\nln -s /path_to_coco_dataset/train2014 datasets/coco/train2014\nln -s /path_to_coco_dataset/test2014 datasets/coco/test2014\nln -s /path_to_coco_dataset/val2014 datasets/coco/val2014\n```\n\nFinally, run the following command to train SimVODIS on both MS COCO and KITTI.\n```bash\nCUDA_VISIBLE_DEVICES=1 python -W ignore train_joint.py --config-file \"configs/e2e_mask_rcnn_R_50_FPN_1x.yaml\" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS \"(480000, 640000)\" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000\n```\n\n## KITTI Evaluation\nFirst, you need to export the ground-truth depth map. We follow the approach described in the [Monodepth2](https://github.com/nianticlabs/monodepth2) repository.\n```bash\npython export_gt_depth.py --data_path PATH/TO/KITTI/DATASET --split eigen\npython export_gt_depth.py --data_path PATH/TO/KITTI/DATASET --split eigen_benchmark\n```\n\nThe following evaluates the depth map prediction performance of trained models on the KITTI benchmark.\n```bash\npython evaluate_depth.py \\\n    --data_path PATH/TO/DATASET \\\n    --load_weights_folder PATH/TO/MODEL/WEIGHTS \\\n    --post_process --save_pred_disp --eval_mono \n```\n\nThe following evaluates the pose estimation performance of trained models on the KITTI benchmark.\n```bash\npython evaluate_pose.py \\\n    --eval_split odom_9 \\\n    --data_path PATH/TO/KITTI/ODOM/DATASET \\\n    --load_weights_folder PATH/TO/MODEL/WEIGHTS\n```\n\n## \n\n## Performance\n\n### Pretrained Networks\nThe following is the pretrained model.\n- [Encoder (same as mask-rcnn)](https://drive.google.com/file/d/1vWJQkYL8y3UQLG-gl-IcVTC9aMlMd5b7/view?usp=sharing)\n- [Depth Decoder](https://drive.google.com/file/d/1Al6vkNDPpDd7i90Ly2uGPPddlzhpuMKI/view?usp=sharing)\n- [Pose Decoder](https://drive.google.com/file/d/1BybvE2seYwDy9VsFlxodSXNl3q8pYI4w/view?usp=sharing)\n\n### Qualitative Results\n![Comparison](./figures/comparison_qualitative.png)\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details\n\n## Citations\n\nPlease consider citing this project in your publications if you find this helpful.\nThe following is the BibTeX.\n\n```\n@article{kim2019simvodis,\n  title={SimVODIS: Simultaneous Visual Odometry, Object Detection, and Instance Segmentation},\n  author={Ue-Hwan Kim, Se-Ho Kim and Jong-Hwan Kim},\n  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence, Under Review},\n  year={2019}\n}\n```\n\n## Acknowledgments\n\nWe base our project on the following repositories\n* [Monodepth2](https://github.com/nianticlabs/monodepth2)\n* [MaskRCNN](https://github.com/facebookresearch/maskrcnn-benchmark)\n\nThis work was supported by Institute for Information \u0026 communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00677, Development of Robot Hand Manipulation Intelligence to Learn Methods and Procedures for Handling Various Objects with Tactile Robot Hands)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuehwan%2Fsimvodis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuehwan%2Fsimvodis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuehwan%2Fsimvodis/lists"}