{"id":28272387,"url":"https://github.com/visionxlab/airspatialbot","last_synced_at":"2026-02-13T10:36:37.697Z","repository":{"id":293811466,"uuid":"985149951","full_name":"VisionXLab/AirSpatialBot","owner":"VisionXLab","description":"[TGRS'25] AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval","archived":false,"fork":false,"pushed_at":"2025-05-17T09:01:11.000Z","size":38814,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-27T10:01:30.910Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VisionXLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-17T06:55:43.000Z","updated_at":"2025-05-22T12:49:24.000Z","dependencies_parsed_at":"2025-05-17T10:29:02.578Z","dependency_job_id":null,"html_url":"https://github.com/VisionXLab/AirSpatialBot","commit_stats":null,"previous_names":["visionxlab/airspatialbot"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/VisionXLab/AirSpatialBot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VisionXLab%2FAirSpatialBot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VisionXLab%2FAirSpatialBot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VisionXLab%2FAirSpatialBot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VisionXLab%2FAirSpatialBot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VisionXLab","download_url":"https://codeload.github.com/VisionXLab/AirSpatialBot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VisionXLab%2FAirSpatialBot/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260437774,"owners_count":23009217,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-20T20:21:52.620Z","updated_at":"2026-02-13T10:36:37.689Z","avatar_url":"https://github.com/VisionXLab.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eAirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval\u003c/h1\u003e\n\n\u003cdiv\u003e\n    \u003ca href='https://zytx121.github.io/' target='_blank'\u003eYue Zhou\u003c/a\u003e\u0026emsp;\n    Ran Ding\u0026emsp;\n    \u003ca href='https://yangxue.site/' target='_blank'\u003eXue Yang\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://ee.sjtu.edu.cn/FacultyDetail.aspx?id=53\u0026infoid=66' target='_blank'\u003eJiang Xue\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://www.researchgate.net/profile/Xingzhao-Liu/' target='_blank'\u003eXingzhao Liu\u003c/a\u003e\u0026emsp;\n\u003c/div\u003e\n\u003cdiv\u003e\n    Shanghai Jiao Tong University\u0026emsp; \n\u003c/div\u003e\n\n[![Paper](https://img.shields.io/badge/IEEE_TGRS-Paper-blue.svg)](https://arxiv.org/abs/2601.01416)\n[![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/erenzhou/AirSpatial)\n[![Dataset](https://img.shields.io/badge/HuggingFace-Model-blue)](https://huggingface.co/erenzhou/AirSpatialBot)\n\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"images/AirSpatialBot.gif\" width=100%\u003e\n\u003c/p\u003e\n\n---\n\n## 📢 Latest Updates\n\n- **[2026.01.01]** [Click](https://github.com/zytx121/Awesome-RS-VL-Data) for the latest trends in **Remote Sensing Vision-Language Datasets and Models**.\n- **[2025.08.24]** We released the AirSpatialBot weights, datasets and code.\n- **[2025.05.17]** We released the AirSpatialBot paper.\n\n---\n\n## Abstract\n\n*Despite notable advancements in remote sensing vision-language models (VLMs), existing models often struggle with spatial understanding, limiting their effectiveness in real-world applications. To push the boundaries of VLMs in remote sensing, we specifically address vehicle imagery captured by drones and introduce a spatially-aware dataset AirSpatial, which comprises over 206K instructions and introduces two novel tasks: Spatial Grounding and Spatial Question Answering. It is also the first remote sensing grounding dataset to provide 3DBB. To effectively leverage existing image understanding of VLMs to spatial domains, we adopt a two-stage training strategy comprising Image Understanding Pre-training and Spatial Understanding Fine-tuning. Utilizing this trained spatially-aware VLM, we develop an aerial agent, AirSpatialBot, which is capable of fine-grained vehicle attribute recognition and retrieval. By dynamically integrating task planning, image understanding, spatial understanding, and task execution capabilities, AirSpatialBot adapts to diverse query requirements. Experimental results validate the effectiveness of our approach, revealing the spatial limitations of existing VLMs while providing valuable insights.*\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"images/main.png\" width=100%\u003e\n  \u003cdiv style=\"display: inline-block; color: #999; padding: 2px;\"\u003e\n      AirSpatialBot’s Visual Understanding and Spatial Understanding Capabilities.\n  \u003c/div\u003e\n\u003c/div\u003e\n\n---\n\n## 🏆 Contributions\n\n- **New Tasks and Dataset.** We introduce AirSpatial, a spatially-aware dataset featuring two novel tasks: Spatial\nGrounding (SG) and Spatial Question Answering (SQA). It is the first RS grounding dataset to provide 3DBB, which will lead critical role of spatial understanding in RS VLMs.\n\n- **Spatially-aware VLM.** We propose a two-stage training strategy, pre-training on 2D RSVG datasets and fine-tuning with AirSpatial to enhance spatial understanding. To facilitate 2D-to-3D knowledge transfer, we introduce ASL, while GML ensures 3D spatial consistency.\n\n\n- **Aerial Agent.** We develop AirSpatialBot, an aerial agent that utilizes our spatially-aware VLM for fine-grained vehicle attribute recognition and retrieval, making it the first approach capable of identifying vehicle brands, models, and pricing information from aerial imagery.\n\n---\n\n## 💬 AirSpatial Dataset\n\nWord cloud visualizations of vehicle occurrence frequencies in our dataset. (a) shows the brand word cloud, where BYD ranks first. (b) illustrates the model word cloud, with Tesla Model 3 ranking first.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/word_cloud.png\" width=100%\u003e\n\u003c/p\u003e\n\n\n---\n\n## 🔍 Spatially-aware VLM\n\nVisualizations of AirSpatialBot on AirSpatial-G with 3DBB. The green boxes indicate ground truth, while the red boxes represent predictions.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/3d_box.png\" width=100%\u003e\n\u003c/p\u003e\n\n---\n\n## 🚀 Aerial Agent\n\nWorkflows for Vehicle Attribute Recognition, Zero-Shot Attribute Recognition and Target Retrieval Tasks.Planner\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"images/tasks.png\" width=100%\u003e\n\u003c/div\u003e\n\n\n\n## 📜 Citation\n```bibtex\n@ARTICLE{zhou2025airspatialbot,\n  author={Zhou, Yue and Ding, Ran and Yang, Xue and Jiang, Xue and Liu, Xingzhao},\n  journal={IEEE Transactions on Geoscience and Remote Sensing}, \n  title={AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval}, \n  year={2025},\n  volume={},\n  number={},\n  pages={1-1},\n  doi={10.1109/TGRS.2025.3570895}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvisionxlab%2Fairspatialbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvisionxlab%2Fairspatialbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvisionxlab%2Fairspatialbot/lists"}