{"id":20516395,"url":"https://github.com/nianticlabs/footprints","last_synced_at":"2025-06-29T01:38:45.118Z","repository":{"id":53491806,"uuid":"254399653","full_name":"nianticlabs/footprints","owner":"nianticlabs","description":"[CVPR 2020] Estimation of the visible and hidden traversable space from a single color image","archived":false,"fork":false,"pushed_at":"2023-01-27T01:12:06.000Z","size":29697,"stargazers_count":219,"open_issues_count":3,"forks_count":21,"subscribers_count":31,"default_branch":"master","last_synced_at":"2025-04-13T06:43:54.760Z","etag":null,"topics":["computer-vision","deep-learning","depth-estimation","monodepth","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nianticlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-09T14:50:06.000Z","updated_at":"2025-01-30T05:33:38.000Z","dependencies_parsed_at":"2023-02-15T03:46:53.439Z","dependency_job_id":null,"html_url":"https://github.com/nianticlabs/footprints","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/nianticlabs/footprints","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Ffootprints","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Ffootprints/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Ffootprints/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Ffootprints/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nianticlabs","download_url":"https://codeload.github.com/nianticlabs/footprints/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Ffootprints/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262520899,"owners_count":23323784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","depth-estimation","monodepth","pytorch"],"created_at":"2024-11-15T21:28:37.435Z","updated_at":"2025-06-29T01:38:45.050Z","avatar_url":"https://github.com/nianticlabs.png","language":"Python","readme":"# [Footprints and Free Space from a Single Color Image](https://arxiv.org/abs/2004.06376)\n\n**[Jamie Watson](https://scholar.google.com/citations?view_op=list_works\u0026hl=en\u0026user=5pC7fw8AAAAJ), [Michael Firman](http://www.michaelfirman.co.uk), [Aron Monszpart](http://aron.monszp.art) and [Gabriel J. Brostow](http://www0.cs.ucl.ac.uk/staff/g.brostow/) – CVPR 2020 (Oral presentation)**\n\n[[Link to Paper](https://arxiv.org/abs/2004.06376)]\n\n\n**We introduce *Footprints*, a method for estimating the visible and hidden traversable space from a single RGB image**\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://storage.googleapis.com/niantic-lon-static/research/footprints/Main_with_yeti_h264.mp4\"\u003e\n  \u003cimg src=\"readme_ims/video_title.jpg\" alt=\"5 minute CVPR presentation video link\" width=\"400\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nUnderstanding the shape of a scene from a single color image is a formidable computer vision task.\nMost methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Models which predict beyond the line of sight often parameterize the scene with voxels or meshes, which can be expensive to use in machine learning frameworks.\n\nOur method predicts the hidden ground geometry and extent from a single image:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"readme_ims/figure_1.png\" alt=\"Web version of figure 1\" width=\"700\" /\u003e\n\u003c/p\u003e\n\nOur predictions enable virtual characters to more realistically explore their environment.\n\n\u003ctable width=\"700\" align=\"center\"\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"readme_ims/penguin_baseline.gif\" alt=\"Baseline exploration\" width=\"300\" /\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"readme_ims/penguin_ours.gif\" alt=\"Our exploration\" width=\"300\" /\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cb\u003eBaseline:\u003c/b\u003e The virtual character can only explore the ground visible to the camera\u003c/td\u003e\n    \u003ctd\u003e\u003cb\u003eOurs:\u003c/b\u003e The penguin can explore both the visible and hidden ground\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n## ⚙️ Setup\n\nOur code and models were developed with PyTorch 1.3.1.\nThe `environment.yml` and `requirements.txt` list our dependencies.\n\nWe recommend installing and activating a new conda environment from these files with:\n```shell\nconda env create -f environment.yml -n footprints\nconda activate footprints\n```\n\n\n## 🖼️ Prediction\n\nWe provide three pretrained models:\n\n- `kitti`, a model trained on the KITTI driving dataset with a resolution of 192x640,\n- `matterport`, a model trained on the indoor Matterport dataset with a resolution of 512x640, and\n- `handheld`, a model trained on our own handheld stereo footage with a resolution of 256x448.\n\nWe provide code to make predictions for a single image, or a whole folder of images, using any of these pretrained models.\nModels will be [automatically downloaded when required](footprints/utils.py#L105), and input images will be automatically resized to the [correct input resolution](footprints/predict_simple.py#21) for each model.\n\nSingle image prediction:\n```shell\npython -m footprints.predict_simple --image test_data/cyclist.jpg --model kitti\n```\n\nMulti image prediction:\n```shell\npython -m footprints.predict_simple --image test_data --model handheld\n```\n\nBy default, `.npy` predictions and `.jpg` visualisations will be saved to the `predictions` folder; this can be changed with the `--save_dir` flag.\n\n## 🚋 Training\n\nTo train a model you will need to download raw [KITTI](http://www.cvlibs.net/datasets/kitti/index.php) \nand [Matterport](https://niessner.github.io/Matterport/) data. Edit the `dataset` field in `paths.yaml` to point to\nthe downloaded raw data paths.\n\nFor details on downloading KITTI, see [Monodepth2](https://github.com/nianticlabs/monodepth2).\n\nYou will also need per-image training data generated from the video sequences:\n- visible ground segmentations\n- hidden ground depths\n- depth maps\n- etc.\n\nOur versions of these can be found [HERE](https://console.cloud.google.com/storage/browser/niantic-lon-static/research/footprints/data). \nDownload these and edit the `training_data` field of `paths.yaml` to point to them.\n\n- KITTI\n    - [depth_masks.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/depth_masks.zip)\n    - [ground_seg.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/ground_seg.zip)\n    - [hidden_depths.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/hidden_depths.zip)\n    - [moving_objects.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/moving_objects.zip)\n    - [optical_flow.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/optical_flow.zip)\n    - [poses.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/poses.zip)\n    - [splits.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/splits.zip)\n    - [stereo_matching_disps.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/training_data/stereo_matching_disps.zip)\n- Matterport\n    - [depth_masks.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/matterport/depth_masks.zip)\n    - [ground_seg.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/matterport/ground_seg.zip)\n    - [hidden_depth.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/matterport/hidden_depth.zip)\n    - [matterport_ground_truth.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/matterport/matterport_ground_truth.zip)\n    - [splits.zip](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/matterport/splits.zip)\n\nAfter this your `paths.yaml` should look like:\n\n```\n# Contents of paths.yaml\n  kitti:\n    dataset: \u003cyour_raw_KITTI_path\u003e\n    training_data: \u003cdownloaded_KITTI_training_data\u003e\n\n  matterport:\n    dataset: \u003cyour_raw_matterport_path\u003e\n    training_data: \u003cdownloaded_matterport_training_data\u003e\n\n  ...\n```\n\nNow you have everything you need to train!\n\nTrain a KITTI model using:\n```shell\nCUDA_VISIBLE_DEVICES=X python -m footprints.main \\\n    --training_dataset kitti \\\n    --log_path \u003cyour_log_path\u003e \\\n    --model_name \u003cyour_model_name\u003e\n```\n\nand a Matterport model using:\n```shell\nCUDA_VISIBLE_DEVICES=X python -m footprints.main \\\n    --training_dataset matterport \\\n    --height 512  --width 640 \\\n    --log_path \u003cyour_log_path\u003e \\\n    --batch_size 8 \\\n    --model_name \u003cyour_model_name\u003e\n```\n\n## Training data generation\n\nIf you want to generate your own training data instead of using ours\n(e.g. you want to try a better ground segmentation algorithm, or more accurate camera poses) then\nyou can!\n\nThere are several key elements of our training data - each can be swapped out for your own.\n\n### Visible depths\nFor KITTI we used [PSMNet](https://github.com/JiaRenChang/PSMNet) to generate disparity maps for stereo pairs.\nThese are inside `stereo_matching_disps`, and are used to generate training labels. These are\nconverted to depth maps using the known focal length and baseline.\nMatterport provides these.\n\n### Camera poses\nFor KITTI we used [ORBSLAMv2](https://github.com/raulmur/ORB_SLAM2) to generate camera poses, which are stored as `npys` inside\nthe `poses` folder. These are used to reproject between cameras.\nMatterport provides these.\n\n### Ground segmentations\nFor both Matterport and KITTI we trained a segmentation network to classify ground pixels in an image.\nWe provide training code for this inside `footprints/preprocessing/segmentation`. These are stored inside\nthe `ground_seg` folder as `npys` and are unthresholded (i.e. raw sigmoid output).\n\n### Optical flow\nFor KITTI, we identify moving objects by comparing `induced flow` to `optical flow`. Our provided optical\nflow estimates come from [LiteFlowNet](https://github.com/twhui/LiteFlowNet), and are inside the `optical_flow` folder.\n\n### Hidden ground depths\nTo compute hidden depths (i.e. the depth to each visible and occluded ground pixel) we use camera poses,\ndepth maps and ground segmentations. These can be generated using (expects a GPU to be available):\n```script\nCUDA_VISIBLE_DEVICES=X  python -m \\\n    footprints.preprocessing.ground_truth_generation.ground_truth_generator \\\n    --type hidden_depths  --data_type kitti --textfile splits/kitti/train.txt\n```\nMake sure to run this on both `train.txt` and `val.txt`. Warning - this will take a while, so to speed things\nup you can do this in parallel by running multiple processes and adding the flags `--start_idx X` and\n`--end_idx Y` to split the textfile into smaller chunks. \n\nNote that if you have already downloaded our training data, running this command will overwrite it unless you \nset `--save_folder_name \u003cmy_save_folder\u003e`. To actually train using this, you can manually set the path inside\n`footprints/datasets/\u003ckitti or matterport dataset.py\u003e`, \nor rename your new data to the required folder name, e.g. `hidden_depths`.\n\n### Moving object masks\nTo compute moving objects masks we use optical flow, depth, ground segmentations and camera poses. These can be\ngenerated by amending the above command with `--type moving_objects`. This is only valid\nfor KITTI.\n\n### Depth masks\nDepth masks are estimates of the *untraversable* pixels in the image, and are computed using\ndepth maps and ground segmentations. To generate these change the above command to use\n`--type depth_masks`.\n\n## ⏳ Evaluation\n\nTo generate predictions for evaluation using a trained model, run:\n```shell\nCUDA_VISIBLE_DEVICES=X python -m footprints.main \\\n    --mode inference \\\n    --load_path \u003cyour_model_path, e.g. logs/model/models/weights_9\u003e \\\n    --inference_data_type \u003ckitti or matterport\u003e \\\n    --height \u003c192 for kitti, 512 for matterport\u003e \\\n    --width 640\n```\nBy default this will save to `\u003cload_path\u003e/\u003cdata_type\u003e_predictions`, but can be specified with\n`--inference_save_path`.\n\nTo evaluate a folder of predictions, run:\n```shell\npython -m footprints.evaluation.evaluate_model \\\n    --datatype kitti \\\n    --metric iou \\\n    --predictions \u003cpath/to/predictions/folder\u003e\n```\n\nThe following options are provided:\n- `--datatype` can be either `kitti` or `matterport`.\n- `--metric` can be `iou` (both `kitti` and `matterport`) or `depth` (for `matterport`)\n\nIf necessary, the ground truth files will be automatically downloaded and placed in the `ground_truth_files` folder.\n\nYou can also download the KITTI annotations directly from [here](https://storage.googleapis.com/niantic-lon-static/research/footprints/data/kitti/kitti_ground_truth.zip).\nFor each image, there are 3 `.png` files:\n\n- `XXXXX_ground.png` contains the mask of the boundary of visible and hidden ground, ignoring all objects\n- `XXXXX_objects.png` contains the mask of the ground space taken up by objects (the *footprints*)\n- `XXXXX_combined.png` contains the full evaluation mask - the visible and hidden ground, taking into account object footprints\n\n## Method and further results\n\nWe learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data, which is used to supervise an image-to-image network.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"readme_ims/figure_3.gif\" alt=\"Video version of figure 3\" width=\"900\" /\u003e\n\u003c/p\u003e\n\nResults on mobile phone footage:\n\n\u003ctable width=\"700\" align=\"center\"\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"readme_ims/ours_1.gif\" alt=\"Rig results\" width=\"300\" /\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"readme_ims/ours_2.gif\" alt=\"Rig results\" width=\"300\" /\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nMore results on the KITTI dataset:\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"readme_ims/kitti_results.gif\" alt=\"KITTI results\" width=\"600\" /\u003e\n\u003c/p\u003e\n\n\n## ✏️ 📄 Citation\n\nIf you find our work useful or interesting, please consider citing [our paper](https://arxiv.org/abs/2004.06376):\n\n```\n@inproceedings{watson-2020-footprints,\n title   = {Footprints and Free Space from a Single Color Image},\n author  = {Jamie Watson and\n            Michael Firman and\n            Aron Monszpart and\n            Gabriel J. Brostow},\n booktitle = {Computer Vision and Pattern Recognition ({CVPR})},\n year = {2020}\n}\n```\n\n\n# 👩‍⚖️ License\nCopyright © Niantic, Inc. 2020. Patent Pending. All rights reserved. Please see the license file for terms.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Ffootprints","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnianticlabs%2Ffootprints","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Ffootprints/lists"}