{"id":25636124,"url":"https://github.com/OS-Copilot/OS-Genesis","last_synced_at":"2025-02-23T00:02:07.864Z","repository":{"id":270669097,"uuid":"905633431","full_name":"OS-Copilot/OS-Genesis","owner":"OS-Copilot","description":"Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis","archived":false,"fork":false,"pushed_at":"2025-01-24T11:16:06.000Z","size":4901,"stargazers_count":84,"open_issues_count":2,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-01-24T12:23:14.253Z","etag":null,"topics":["agents","data-synthesis","gui","multimodal"],"latest_commit_sha":null,"homepage":"https://qiushisun.github.io/OS-Genesis-Home/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OS-Copilot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-19T08:14:03.000Z","updated_at":"2025-01-24T11:16:10.000Z","dependencies_parsed_at":"2025-01-02T09:48:33.977Z","dependency_job_id":null,"html_url":"https://github.com/OS-Copilot/OS-Genesis","commit_stats":null,"previous_names":["os-copilot/os-genesis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OS-Copilot%2FOS-Genesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OS-Copilot%2FOS-Genesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OS-Copilot%2FOS-Genesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OS-Copilot%2FOS-Genesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OS-Copilot","download_url":"https://codeload.github.com/OS-Copilot/OS-Genesis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240250363,"owners_count":19771780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","data-synthesis","gui","multimodal"],"created_at":"2025-02-23T00:02:04.107Z","updated_at":"2025-02-23T00:02:07.760Z","avatar_url":"https://github.com/OS-Copilot.png","language":"Jupyter Notebook","funding_links":[],"categories":["Papers"],"sub_categories":["Dataset"],"readme":"# OS-Genesis\n\n\n\u003cimg src=\"./static/OS-Genesis-Badge.png\" alt=\"overview\" style=\"zoom:80%;\" /\u003e\n\n\n[![arXiv](https://img.shields.io/badge/arXiv-2412.19723-b31b1b.svg)](https://arxiv.org/abs/2412.19723) \n![License](https://img.shields.io/badge/License-MIT-blue)\n[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-sm.svg)](https://huggingface.co/papers/2412.19723)\n[![Generic badge](https://img.shields.io/badge/WeChat-机器之心-green.svg?logo=wechat)](https://mp.weixin.qq.com/s/_gu3NSCpAbAE1A8mEhGD7Q)\n\u003ca href = \"https://zhuanlan.zhihu.com/p/18229337790\"\u003e\u003cimg src=\"https://img.shields.io/badge/-%E7%9F%A5%E4%B9%8E-%232f6be0\" target=\"_blank\"\u003e\u003c/a\u003e\n\u003c!-- [![Twitter Follow](https://img.shields.io/twitter/follow/qiushi_sun)](https://twitter.com/qiushi_sun)\n[![Twitter Follow](https://img.shields.io/twitter/follow/zichen_ding)](https://twitter.com/heroding77)\n[![Twitter Follow](https://img.shields.io/twitter/follow/chuanyang_jin)](https://twitter.com/chuanyang_jin) --\u003e\n\n\nThis repository contains the code and data for the paper [OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis](https://arxiv.org/abs/2412.19723).\n\u003e We are uploading the data and checkpoints. Due to bandwidth limitations, this will take some time. Stay tuned!\n\n\n## Overview\n\nWe introduce OS-Genesis, an interaction-driven pipeline for synthesizing high-quality and diverse GUI agent trajectory data without human supervision or predefined tasks. By leveraging reverse task synthesis and a trajectory reward model, OS-Genesis enables effective end2end training of GUI agents.\n\n\u003c!-- ![overview](./static/OS-Genesis.png) --\u003e\n\n\u003cimg src=\"./static/OS-Genesis.png\" alt=\"overview\" style=\"zoom:20%;\" /\u003e\n\n\n## Training\n\nFor details and operations of the training, please refer to the [InternVL2 documentation](https://internvl.readthedocs.io/en/latest/get_started/installation.html) and [Qwen2-VL](https://github.com/QwenLM/Qwen2-VL).\n\n## Evaluation\n### AndroidControl\nTo evaluate the AndroidControl Benchmark, please follow the steps below:\n\n1. **Clone the GitHub Repository:**\n\n   ```\n   git clone https://github.com/OS-Copilot/OS-Genesis.git\n   ```\n\n2. **Inference:**\n   ```\n   cd OS-Genesis/evaluation/android_control\n   bash run_ac_inference.sh $dataset $checkpoint\n   ```\n\n3. **Evaluation:**\n   ```\n   pyhton ac_eval.py\n   ```\n\n## Mobile\n### AndroidControl\n\n|   Model Name    |                           Base Model                                            |                           Training Data                                            |                           HF Link                           |\n| :-------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :---------------------------------------------------------: |\n| OS-Genesis-4B-AC | [InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B)            | [OS-Genesis-ac-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data/blob/main/os_genesis_ac_training_data.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-4B-AC)  |\n| OS-Genesis-7B-AC | [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | [OS-Genesis-ac-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data/blob/main/os_genesis_ac_training_data.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-7B-AC)  |\n| OS-Genesis-8B-AC | [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)            | [OS-Genesis-ac-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data/blob/main/os_genesis_ac_training_data.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-8B-AC)  |\n\n### AndroidWorld\n\n|   Model Name    |                           Base Model                                            |                           Training Data                                            |                           HF Link                           |\n| :-------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :---------------------------------------------------------: |\n| OS-Genesis-4B-AW | [InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B)            | [OS-Genesis-aw-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data/blob/main/os_genesis_aw_training_data.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-4B-AW)  |\n| OS-Genesis-7B-AW | [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | [OS-Genesis-aw-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data/blob/main/os_genesis_aw_training_data.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-7B-AW)  |\n| OS-Genesis-8B-AW | [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)            | [OS-Genesis-aw-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-mobile-data/blob/main/os_genesis_aw_training_data.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-8B-AW)  |\n\n\n## Web\n\n|   Model Name    |                           Base Model                                            |                           Training Data                                            |                           HF Link                           |\n| :-------------: | :-------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :---------------------------------------------------------: |\n| OS-Genesis-4B-WA | [InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B)            | [OS-Genesis-web-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-web-data/blob/main/os_genesis_web_training.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-4B-WA)  |\n| OS-Genesis-7B-WA | [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | [OS-Genesis-web-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-web-data/blob/main/os_genesis_web_training.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-7B-WA)  |\n| OS-Genesis-8B-WA | [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)            | [OS-Genesis-web-training-data](https://huggingface.co/datasets/OS-Copilot/OS-Genesis-web-data/blob/main/os_genesis_web_training.jsonl) | [🤗 link](https://huggingface.co/OS-Copilot/OS-Genesis-8B-WA)  |\n\n## FAQ ❓\n\nWe have collected some questions from emails, Hugging Face, and WeChat communications. Please check the [FAQ](https://github.com/OS-Copilot/OS-Genesis/blob/main/faq.md) 🤖\n\n## Citation 📖\n\n🫶 If you are interested in our work or find this repository / our data helpful, please consider using the following citation format when referencing our paper:\n\n```bibtex\n@article{sun2024genesis,\n  title={OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis},\n  author={Sun, Qiushi and Cheng, Kanzhi and Ding, Zichen and Jin, Chuanyang and Wang, Yian and Xu, Fangzhi and Wu, Zhenyu and Jia, Chengyou and Chen, Liheng and Liu, Zhoumianze and others},\n  journal={arXiv preprint arXiv:2412.19723},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOS-Copilot%2FOS-Genesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOS-Copilot%2FOS-Genesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOS-Copilot%2FOS-Genesis/lists"}