{"id":19425596,"url":"https://github.com/rese1f/steve","last_synced_at":"2025-04-24T16:31:40.472Z","repository":{"id":177225669,"uuid":"658715969","full_name":"rese1f/STEVE","owner":"rese1f","description":"[ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment","archived":false,"fork":false,"pushed_at":"2023-12-27T03:48:56.000Z","size":118244,"stargazers_count":36,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-19T11:17:39.064Z","etag":null,"topics":["computer-vision","dataset","embodied-agent","large-language-models","llama","minecraft"],"latest_commit_sha":null,"homepage":"https://rese1f.github.io/STEVE/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rese1f.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-26T10:51:33.000Z","updated_at":"2025-01-31T19:03:24.000Z","dependencies_parsed_at":"2024-08-03T07:49:15.981Z","dependency_job_id":"20a12ed8-b065-4f45-abe4-dd40744be774","html_url":"https://github.com/rese1f/STEVE","commit_stats":{"total_commits":28,"total_committers":2,"mean_commits":14.0,"dds":0.0357142857142857,"last_synced_commit":"7bcad48e888b7cd1ad67bbba787f4541f8ef2f3f"},"previous_names":["rese1f/steve"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rese1f%2FSTEVE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rese1f%2FSTEVE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rese1f%2FSTEVE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rese1f%2FSTEVE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rese1f","download_url":"https://codeload.github.com/rese1f/STEVE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250663544,"owners_count":21467366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","dataset","embodied-agent","large-language-models","llama","minecraft"],"created_at":"2024-11-10T14:04:12.244Z","updated_at":"2025-04-24T16:31:39.957Z","avatar_url":"https://github.com/rese1f.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"asset/logo.png\" height=\"120px\" align=\"right\"\u003e\n\n# STEVE\n\n[![](http://img.shields.io/badge/cs.CV-arXiv%3A2311.15209-B31B1B.svg)](https://arxiv.org/abs/2311.15209)\n[![](https://img.shields.io/badge/code-code_v0-blue)](https://github.com/rese1f/STEVE/tree/code-v0)\n[![](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-orange)](https://huggingface.co/SeeThink/STEVE-13b)\n\n\u003e **See and Think: Embodied Agent in Virtual Environment**  \n\u003e Zhonghan Zhao*, Wenhao Chai*, Xuan Wang*, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang✉️   \n\u003e _arXiv 2023_  \n\n[![](https://img.youtube.com/vi/ZiH1VuR-9GY/0.jpg)](https://www.youtube.com/embed/ZiH1VuR-9GY?si=LTj5NhLg7--3Cya1)\n\nSTEVE, named after the protagonist of the game Minecraft, is our proposed framework aims to build an embodied agent based on the vision model and LLMs within an open world.\n\n## :fire: News\n* **[2023.12.13]** : We release our STEVE-7B model at [huggingface](https://huggingface.co/SeeThink/STEVE-7b).\n* **[2023.12.11]** : We release our STEVE-13B model at [huggingface](https://huggingface.co/SeeThink/STEVE-13b).\n* **[2023.12.06]** : We release our code at [code-v0 branch](https://github.com/rese1f/STEVE/tree/code-v0).\n* **[2023.11.26]** :page_with_curl: We release the [paper](https://arxiv.org/abs/2311.15209).\n\n\u003ch3 align=\"center\"\u003e If you like our project, please give us a star ⭐ on GitHub for the latest update.\u003c/h3\u003e\n\n## 💡 Overview\nThe Vision Perception part takes images or videos, encodes them into tokens, and combines them with the tokens of Agent State and Task as input. The STEVE-13B in the Language Instruction part is used for automatic reasoning and task decomposition, and it calls the Skill Database in the form of the Query to output code as action.\n![](asset/overview.png)\n\n## 📣 Demo Video\n[![](https://img.youtube.com/vi/NzJEqhIbcZg/0.jpg)](https://www.youtube.com/embed/NzJEqhIbcZg?si=_flZME4YDfok4LVn)\n[![](https://img.youtube.com/vi/OWJDZGwephs/0.jpg)](https://www.youtube.com/embed/OWJDZGwephs?si=Vig4h99HPsNf95CP)\n[![](https://img.youtube.com/vi/sloqnCtx4kc/0.jpg)](https://www.youtube.com/embed/sloqnCtx4kc?si=eMj_bNEHlg0wg7Py)\n[![](https://img.youtube.com/vi/ziYueiXBP7A/0.jpg)](https://www.youtube.com/embed/ziYueiXBP7A?si=76TWzSlHsEeC7rv1)\n[![](https://img.youtube.com/vi/6riHoiocb8k/0.jpg)](https://www.youtube.com/embed/6riHoiocb8k?si=PJC6Plb8hQQohQgI)\n[![](https://img.youtube.com/vi/LualEoZ7EZQ/0.jpg)](https://www.youtube.com/embed/LualEoZ7EZQ?si=xWTxrJEnZeVRedEt)\n\n## ✏️ Citation\n\nIf you find STEVE useful for your your research and applications, please cite using this BibTeX:\n\n```bibtex\n@article{zhao2023see,\n  title={See and Think: Embodied Agent in Virtual Environment},\n  author={Zhao, Zhonghan and Chai, Wenhao and Wang, Xuan and Boyi, Li and Hao, Shengyu and Cao, Shidong and Ye, Tian and Hwang, Jenq-Neng and Wang, Gaoang},\n  journal={arXiv preprint arXiv:2311.15209},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frese1f%2Fsteve","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frese1f%2Fsteve","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frese1f%2Fsteve/lists"}