{"id":29222243,"url":"https://github.com/mit-spark/vggt-slam","last_synced_at":"2026-02-20T06:01:25.790Z","repository":{"id":301963216,"uuid":"994935754","full_name":"MIT-SPARK/VGGT-SLAM","owner":"MIT-SPARK","description":"VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold","archived":false,"fork":false,"pushed_at":"2025-11-19T16:30:46.000Z","size":26204,"stargazers_count":700,"open_issues_count":7,"forks_count":66,"subscribers_count":21,"default_branch":"main","last_synced_at":"2026-01-27T17:08:01.273Z","etag":null,"topics":["computer-vision","slam","vggt","vggt-slam"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MIT-SPARK.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-02T17:59:13.000Z","updated_at":"2026-01-21T10:59:10.000Z","dependencies_parsed_at":"2025-09-10T22:39:45.017Z","dependency_job_id":null,"html_url":"https://github.com/MIT-SPARK/VGGT-SLAM","commit_stats":null,"previous_names":["mit-spark/vggt-slam"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/MIT-SPARK/VGGT-SLAM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIT-SPARK%2FVGGT-SLAM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIT-SPARK%2FVGGT-SLAM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIT-SPARK%2FVGGT-SLAM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIT-SPARK%2FVGGT-SLAM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MIT-SPARK","download_url":"https://codeload.github.com/MIT-SPARK/VGGT-SLAM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIT-SPARK%2FVGGT-SLAM/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29642905,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T05:21:04.652Z","status":"ssl_error","status_checked_at":"2026-02-20T05:21:04.238Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","slam","vggt","vggt-slam"],"created_at":"2025-07-03T03:06:49.847Z","updated_at":"2026-02-20T06:01:25.784Z","avatar_url":"https://github.com/MIT-SPARK.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ch1\u003eVGGT-SLAM 2.0\u003c/h1\u003e\n\n  \u003cp\u003e\n    \u003cstrong\u003eVGGT-SLAM\u003c/strong\u003e \n    \u003ca href=\"https://arxiv.org/abs/2505.12549\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/arXiv-b33737?logo=arXiv\" alt=\"arXiv\" style=\"vertical-align:middle\"\u003e\n    \u003c/a\u003e\n    \u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n    \u003cstrong\u003eVGGT-SLAM 2.0\u003c/strong\u003e \n    \u003ca href=\"https://arxiv.org/abs/2601.19887\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/arXiv-b33737?logo=arXiv\" alt=\"arXiv\" style=\"vertical-align:middle\"\u003e\n    \u003c/a\u003e\n  \u003c/p\u003e\n\n  \u003cbr /\u003e\n\n  \u003cimg src=\"assets/vggt_slam_demo.gif\" alt=\"VGGT-SLAM\" width=\"95%\"/\u003e\n\n  \u003cp\u003e\u003cstrong\u003e\u003cem\u003eVGGT-SLAM 2.0: Real-time Dense Feed-forward Scene Reconstruction\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\n  \u003cp\u003e\n    \u003ca href=\"https://dominic101.github.io/DominicMaggio/\"\u003e\u003cstrong\u003eDominic Maggio\u003c/strong\u003e\u003c/a\u003e \u0026nbsp;·\u0026nbsp;\n    \u003ca href=\"https://lucacarlone.mit.edu/\"\u003e\u003cstrong\u003eLuca Carlone\u003c/strong\u003e\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n---\n\n# This repo contains the code for VGGT-SLAM 2.0 (located here) and VGGT-SLAM (located on the version1.0 branch of this repo).\n\n## 📚 Table of Contents\n* [💻 Installation](#installation-of-vGGT-sLAM)\n* [🚀 Quick Start](#quick-start)\n* [📊 Running Evaluations](#running-evaluations)\n* [📄 News and Updates](#News-and-Updates)\n* [📄 Paper Citation](#citation)\n\n---\n\n## Installation of VGGT-SLAM\n\nClone VGGT-SLAM:\n\n```\ngit clone https://github.com/MIT-SPARK/VGGT-SLAM\n```\n\n```\ncd VGGT-SLAM\n```\n\n### Create and activate a new conda environment\n\n```\nconda create -n vggt-slam python=3.11\n```\n\n```\nconda activate vggt-slam\n```\n\n### Make the setup script executable and run it\nThis step will automatically download all 3rd party packages including Perception Encoder, SAM 3, and our fork of VGGT. More details on the license for Perception Encoder \ncan be found [here](https://github.com/facebookresearch/perception_models/blob/main/LICENSE.PE), for SAM3 can be found [here](https://github.com/facebookresearch/sam3/blob/main/LICENSE), and for VGGT can be found [here](https://github.com/facebookresearch/vggt/blob/main/LICENSE.txt). Note that we only use SAM 3 and Perception Encoder for optional open-set 3D object detection.\n\n```\nchmod +x setup.sh\n./setup.sh\n```\n\n---\n\n## Quick Start\n\nrun `python main.py --image_folder /path/to/image/folder --max_loops 1 --vis_map` replacing the image path with your folder of images. \nThis will create a visualization in viser which shows the incremental construction of the map.\n\nAs an example, we provide a folder of test images in `office_loop.zip` which will generate the following map. Using the default parameters will\nresult in a single loop closure towards the end of the trajectory. Unzip the folder and set its path as the arguments for `--image_folder`, e.g.,\n\n```\nunzip office_loop.zip\n```\n\nand then run the below command:\n\n```\npython3 main.py --image_folder office_loop --max_loops 1 --vis_map\n```\n\nUse the `--run_os` flag to enable 3D open-set object detection. This will prompt the user for text queries and plot a 3D bounding box of the detection on the map in viser. The office loop scene does not have very many interesting objects, but some example queries that can be used are \"coffee machine\", \"sink\", \"printer\", \"cone\", and \"refrigerator.\" For some example scenes with more interesting objects, check out the Clio apartment and cubicle scene which can be downloaded \nfrom [here](https://www.dropbox.com/scl/fo/5bkv8rsa2xvwmvom6bmza/AOc8VW71kuZCgQjcw_REbWA?rlkey=wx1njghufcxconm1znidc1hgw\u0026e=1\u0026st=c809h8h3\u0026dl=0).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/office_loop_figure.png\" width=\"300\"\u003e\n\u003c/p\u003e\n\n\n### Collecting Custom Data\n\nTo quickly collect a test on a custom dataset, you can record a trajectory with a cell phone and convert the MOV file to a folder of images with:\n\n```\nmkdir \u003cdesired_location\u003e/img_folder\n```\n\nAnd then, run the command below:\n\n```\nffmpeg -i /path/to/video.MOV -vf \"fps=10\" \u003cdesired_location\u003e/img_folder/frame_%04d.jpg\n```\nNote while vertical cell phone videos can work, to avoid images being cropped it is recommended to use horizontal videos. \n\n### Adjusting Parameters\n\nSee main.py or run `--help` from main.py to view all parameters. \n\n---\n\n## Running Evaluations\n\nTo automatically run evaluation on TUM and 7-Scenes datasets, first install the datasets using the provided download instructions from [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM?tab=readme-ov-file#examples). Set the download location of MASt3R-SLAM by setting *abs_dir* in the bash scripts \n*/evals/eval_tum.sh* and */evals/eval_7scenes.sh*\n\n#### In Tum Dataset\n\nTo run on TUM, run `./evals/eval_tum.sh \u003cw\u003e` and then run `python evals/process_logs_tum.py --submap_size \u003cw\u003e` to analyze and print the results, where w is \nthe submap size, for example:\n\n```\n./evals/eval_tum.sh 32\n```\n\n```\npython evals/process_logs_tum.py --submap_size 32\n```\n\nTo visualize the maps as they being constructed, inside the bash scripts add `--vis_map`. This will update the viser map each time the submap is updated. \n\n## News and Updates\n* May 2025: VGGT-SLAM 1.0 is released\n* August 2025: SL(4) optimization is integrated into the official GTSAM repo\n* September 2025: VGGT-SLAM 1.0 Accepted to Neurips 2025\n* November 2025: VGGT-SLAM 1.0 Featured in MIT News [article](https://news.mit.edu/2025/teaching-robots-to-map-large-environments-1105)\n* January 2026: VGGT-SLAM 2.0 is released\n\n## Todo\n\n- [ ] Release real-time code. This code enables plugging in a Real Sense Camera and incrementally constructing a map \nas the camera explored a scene. This has been tested on a Jetson Thor onboard a robot.\n- [ ] Add optional code to sparsify the visualized map as visualizing large point cloud maps can slow down the code.\n\n## Acknowledgement\n\nThis work was supported in part by the NSF Graduate Research Fellowship\nProgram under Grant 2141064, the ARL DCIST program, and the ONR\nRAPID program.\n\n## Citation\n\nIf our code is helpful, please cite our papers as follows:\n\n```\n@article{maggio2025vggt-slam,\n  title={VGGT-SLAM: Dense RGB SLAM Optimized on the SL (4) Manifold},\n  author={Maggio, Dominic and Lim, Hyungtae and Carlone, Luca},\n  journal={Advances in Neural Information Processing Systems},\n  volume={39},\n  year={2025}\n}\n```\n\n```\n@article{maggio2025vggt-slam2,\n  title={VGGT-SLAM 2.0: Real-time Dense Feed-forward Scene Reconstruction},\n  author={Maggio, Dominic and Carlone, Luca},\n  journal={arXiv preprint arXiv:2601.19887},\n  year={2026}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-spark%2Fvggt-slam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmit-spark%2Fvggt-slam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-spark%2Fvggt-slam/lists"}