{"id":32521546,"url":"https://github.com/opengvlab/sid-vln","last_synced_at":"2025-10-28T06:29:44.220Z","repository":{"id":317308134,"uuid":"1051114923","full_name":"OpenGVLab/SID-VLN","owner":"OpenGVLab","description":"Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale","archived":false,"fork":false,"pushed_at":"2025-09-30T05:37:20.000Z","size":6812,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-30T06:08:10.255Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenGVLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-05T13:17:13.000Z","updated_at":"2025-09-30T05:37:24.000Z","dependencies_parsed_at":"2025-09-30T06:18:28.331Z","dependency_job_id":null,"html_url":"https://github.com/OpenGVLab/SID-VLN","commit_stats":null,"previous_names":["opengvlab/sid-vln"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/OpenGVLab/SID-VLN","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FSID-VLN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FSID-VLN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FSID-VLN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FSID-VLN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenGVLab","download_url":"https://codeload.github.com/OpenGVLab/SID-VLN/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FSID-VLN/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281397339,"owners_count":26493908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-28T06:29:39.078Z","updated_at":"2025-10-28T06:29:44.215Z","avatar_url":"https://github.com/OpenGVLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale\r\n\r\n[![arxiv](https://img.shields.io/badge/arXiv_2509.24910-red?logo=arxiv)](http://arxiv.org/abs/2509.24910)\r\n[![hf](https://img.shields.io/badge/Hugging_Face-FF9D00?logo=huggingface\u0026logoColor=white)](https://huggingface.co/papers/2509.24910)\r\n\r\n#### [Songze Li](https://scholar.google.com/citations?user=8rBMUD4AAAAJ), [Zun Wang](https://zunwang1.github.io/), [Gengze Zhou](https://gengzezhou.github.io/), [Jialu Li](https://jialuli-luka.github.io/), [Xiangyu Zeng](https://lanxingxuan.github.io/), [Limin Wang](https://wanglimin.github.io/), [Yu Qiao](https://scholar.google.com/citations?hl=en\u0026user=gFtI-8QAAAAJ), [Qi Wu](http://www.qi-wu.me/), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/), [Yi Wang](https://shepnerd.github.io/)\r\n\r\n![SID](SID.png)\r\n\r\n## 🏠 About\r\n\r\nGoal-oriented language-guided navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize shortest-path trajectories, lacking effective exploration priors for prioritizing the success rate. To address the above challenges, we present SID, a goal-oriented language-guided navigation learning approach with Self-Improving Demonstrations. Specifically, SID learns an initial agent on the shortest-path data sampled from environments and then leverages this agent to generate novel exploration trajectories. The novel rollouts provide demonstrations with stronger exploration signals to train a better agent, which in turn produces higher-quality agent demonstrations for the next round of training. We show that this iterative self-improving pipeline readily scales to new environments, and the resulting demonstrations can be transferred across a variety of language-guided navigation tasks, elevating the performance ceiling in diverse goal-oriented navigation. Extensive experiments demonstrate that SID significantly boosts the exploration capabilities and generalization of navigation agents. The resulting agent achieves new state-of-the-art performance on goal-oriented language-guided navigation tasks, including REVERIE, SOON, notably achieving a 50.9% success rate on the unseen validation splits of SOON, surpassing the prior leading approaches by a margin of 13.9%.\r\n\r\n## 📢 Update\r\n\r\n[2025-09-30] We realease the [paper](http://arxiv.org/abs/2509.24910) for SID-VLN.\r\n\r\n[2025-09-22] We realease the code and data for SID-VLN.\r\n\r\n## 🛠 Getting Started\r\n\r\nWe test under the following environment:\r\n\r\n* Python 3.8.10\r\n* Pytorch 2.0.0\r\n* CUDA Version 11.7\r\n\r\n1. **Install Matterport3D simulators:** follow detailed instructions [here](https://github.com/peteanderson80/Matterport3DSimulator). We use the latest version instead of v0.1. Here is simplified instructions:\r\n\r\n   ```bash\r\n   git clone git@github.com:peteanderson80/Matterport3DSimulator.git\r\n   git submodule update --init --recursive \r\n   sudo apt-get install libjsoncpp-dev libepoxy-dev libglm-dev libosmesa6 libosmesa6-dev libglew-dev libopencv-dev\r\n   mkdir build \u0026\u0026 cd build\r\n   cmake -DEGL_RENDERING=ON ..\r\n   make -j8\r\n   ```\r\n\r\n   After successful installation, run:\r\n\r\n   ```bash\r\n   cp your_path/Matterport3DSimulator/build/MatterSim.cpython-38-x86_64-linux-gnu.so your_conda_path/envs/sidvln/lib/python3.8/MatterSim.cpython-38-x86_64-linux-gnu.so\r\n   export PYTHONPATH=your_path/SIDVLN/mapnav:$PYTHONPATH\r\n   export PYTHONPATH=your_path/Matterport3DSimulator/build:$PYTHONPATH\r\n   ```\r\n\r\n2. **Install requirements:**\r\n\r\n   ```bash\r\n   conda create --name sidvln python=3.8.10\r\n   conda activate sidvln\r\n   cd SID-VLN\r\n   pip install -r requirements.txt\r\n   ```\r\n\r\n## 🏆 Model and Data\r\n\r\nWe release our final pretrained model and available data [here](https://huggingface.co/datasets/SongzeLi/SID-VLN/tree/main). Details:\r\n\r\n**Connectivity:**\r\n\r\n1. Connectivity of the navigation graphs.\r\n\r\n**Data:**\r\n\r\n1. `scan_round0_860scan.jsonl` – Image goal navigatoin trajectories in 800 HM3D environements.\r\n2. `sid_lang_goal.jsonl` – Final detailed caption goal navigatoin trajectories for pretraining and REVERIE augmentation.\r\n3. `img_goal_val*.json` – Image goal navigation validation seen and unseen splits.\r\n4. `cap_goal_val*.json` – Caption goal navigation validation seen and unseen splits.\r\n5. `scanvp_candview_relangles_with_hm3d_gibson.json` – Candidates related to scan and vp in HM3D environments.\r\n\r\n\r\n**Features:**\r\n1. `siglip_base.hdf5` – SigLIP features on MP3D and HM3D environments.  \r\n2. `dinov2_base.hdf5` – DINOv2 features on MP3D and HM3D environments.  \r\n3. `obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5` – Object features for REVERIE.\r\n\r\n**HM3D_cap:**\r\n\r\n1. Generated detailed style captions for target images in HM3D and MP3D environments. \r\n\r\n**Model:**\r\n\r\n1. `model_step_124000.pt` – The final pretrained model for downstream VLN finetuning.\r\n2. `img_goal_best_val_unseen` – The image goal navigation agent which can be utilized to generate trajectories with high quality demonstrations on exploration strategies.\r\n3. `model_LXRT.pth` – The pretrained LXMERT model for initialization DUET.\r\n\r\n```\r\nThe data folder should follow this structure:\r\n\r\n```shell\r\ndatasets/\r\n├── ckpts/\r\n    ├── model_LXRT.pth\r\n    ├── img_goal_best_val_unseen\r\n    ├── model_step_124000.pt   \r\n|── REVERIE\r\n│   ├── annotations/\r\n│   \t├── scan_round0_860scan.jsonl       \r\n│     \t├── sid_lang_goal.jsonl\r\n│     \t├── img_goal_val*.json\r\n│     \t├── cap_goal_val*.json\r\n│     \t└── scanvp_candview_relangles_with_hm3d_gibson.json  \r\n│   ├── connectivity/\r\n        ├── scanname_connectivity.json\r\n        └── scans.txt\r\n│   ├── features/\r\n│   \t├── siglip_base.hdf5        \r\n│     \t├── dinov2_base.hdf5\r\n│     \t└── obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5     \r\n├── SOON/\r\n```\r\n\r\n## 🚀 Training\r\n\r\n1. **Multi-Round SID Pre-training**\r\n\r\n   We use 8 NVIDIA A800 GPUs for pre-training agents on image goal navigation.\r\n\r\n   ```bash\r\n   cd pretrain\r\n   bash run_img_goal.sh\r\n   ```\r\n\r\n2. **SID Fine-tunning \u0026 Trajectories Generating**\r\n\r\n   We use 8 NVIDIA A800 GPUs for fine-tuning agents and generating trajectories for next-round training.\r\n\r\n   ```bash\r\n   cd mapnav\r\n   bash scripts/run_img_goal.sh\r\n   ```\r\n\r\n3. **Langugae Goal Pre-training**\r\n\r\n   We use 8 NVIDIA A800 GPUs for pre-training language goal navigation agents.\r\n\r\n   ```bash\r\n   bash run_lang_goal.sh\r\n   ```\r\n\r\n4. **Downstream VLN tasks Fine-tuning**\r\n\r\n   We use one NVIDIA A800 GPU for finetuning our agent on downstream VLN tasks. Concrete config is presented in the scripts.\r\n\r\n   ```bash\r\n   bash run_lang_goal.sh\r\n   ```\r\n\r\n## 🙋‍♂️ Questions or Issues\r\n\r\n Please feel free to [open an issue](https://github.com/OpenGVLab/SID-VLN/issues) if you encounter any problems or have questions about SID-VLN.\r\n\r\n\r\n## 🔗 Citation\r\n\r\nIf you find our work useful in your research, please consider starring 🌟 this repo and cite the following paper:\r\n\r\n```bibtex\r\n@article{li2025learning,\r\n  title={Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale},\r\n  author={Li, Songze and Wang, Zun and Zhou, Gengze and Li, Jialu and Zeng, Xiangyu and Wang, Limin and Qiao, Yu and Wu, Qi and Bansal, Mohit and Wang, Yi},\r\n  journal={arXiv preprint arXiv:2509.24910},\r\n  year={2025}\r\n}\r\n```\r\n\r\n## 👏 Acknowledgements\r\n\r\n\r\nWe thank the developers of [DUET](https://github.com/cshizhe/VLN-DUET), [SRDF](https://github.com/wz0919/VLN-SRDF), [InternVL](https://github.com/OpenGVLab/InternVL) for their public code release.\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopengvlab%2Fsid-vln","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopengvlab%2Fsid-vln","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopengvlab%2Fsid-vln/lists"}