https://github.com/philfung/awesome-computer-use
Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
https://github.com/philfung/awesome-computer-use
List: awesome-computer-use
anthropic anthropic-claude computer-use computer-vision gpt-4-vision gui-agents llm rpa rpa-robotic-process-automation tool-use vision
Last synced: 17 days ago
JSON representation
Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
- Host: GitHub
- URL: https://github.com/philfung/awesome-computer-use
- Owner: philfung
- Created: 2024-10-31T18:17:22.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-19T21:36:05.000Z (about 1 year ago)
- Last Synced: 2025-03-16T04:01:43.188Z (11 months ago)
- Topics: anthropic, anthropic-claude, computer-use, computer-vision, gpt-4-vision, gui-agents, llm, rpa, rpa-robotic-process-automation, tool-use, vision
- Homepage:
- Size: 24.4 KB
- Stars: 19
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#
Awesome Computer Use
Curated list of papers + libraries related to computer GUI use via LLMs.\
Highly opinionated, focus on quality vs quantity.
## Demos
* Try [computer use on your Mac](https://github.com/philfung/computer-use) in one click.
## Frameworks
* [Openwork](https://github.com/accomplish-ai/openwork) - MIT-licensed, open alternative to Anthropic's Cowork with multi-LLM support for browser automation.
## Papers
* [WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning](https://arxiv.org/html/2411.02337v1) (*Tsinghua U*) (11/24)
* [Anthropic Claude Computer Use API](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) (*Anthropic*) (10/24)
* [OmniParser for Pure Vision Based GUI Agent](https://microsoft.github.io/OmniParser/) ([code](https://github.com/microsoft/OmniParser)) (*Microsoft*) (08/24)
* [ECLAIR: Enterprise sCaLe AI for woRkflows](https://hazyresearch.stanford.edu/blog/2024-05-18-eclair)([code](https://github.com/HazyResearch/eclair-agents)) (*Stanford U*) (05/24)
* [OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments](https://os-world.github.io/) ([code](https://github.com/xlang-ai/OSWorld)) (*HKU*) (05/24)
* [Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs](https://arxiv.org/abs/2404.05719) ([code](https://github.com/apple/ml-ferret/tree/main/ferretui)) (*Apple*) (04/24)
* [SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded](https://osu-nlp-group.github.io/SeeAct/) ([code](https://github.com/OSU-NLP-Group/SeeAct)) (*OSU*) (01/24)
* [CogAgent: A Visual Language Model for GUI Agents](https://github.com/THUDM/CogVLM2) (*ZhiPu*)(12/23)
* [AppAgent: Multimodal Agents as Smartphone Users](https://appagent-official.github.io/) ([code](https://github.com/mnotgod96/AppAgent)) (*TenCent*) (12/23)
* [SoM : Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V](https://som-gpt4v.github.io/) ([code](https://github.com/microsoft/SoM)) (*Microsoft*) (10/23)
# Talks
* [LLMs as Computer Users: An Overview](https://www.figma.com/deck/rsWK4sRl0dOahG59bfMhql)