https://github.com/philfung/awesome-computer-use

Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
https://github.com/philfung/awesome-computer-use

List: awesome-computer-use

anthropic anthropic-claude computer-use computer-vision gpt-4-vision gui-agents llm rpa rpa-robotic-process-automation tool-use vision

Last synced: 17 days ago
JSON representation

Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.

Host: GitHub
URL: https://github.com/philfung/awesome-computer-use
Owner: philfung
Created: 2024-10-31T18:17:22.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-19T21:36:05.000Z (about 1 year ago)
Last Synced: 2025-03-16T04:01:43.188Z (11 months ago)
Topics: anthropic, anthropic-claude, computer-use, computer-vision, gpt-4-vision, gui-agents, llm, rpa, rpa-robotic-process-automation, tool-use, vision
Homepage:
Size: 24.4 KB
Stars: 19
Watchers: 3
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          #  Awesome Computer Use 

Curated list of papers + libraries related to computer GUI use via LLMs.\

Highly opinionated, focus on quality vs quantity.

## Demos

* Try [computer use on your Mac](https://github.com/philfung/computer-use) in one click.

## Frameworks

* [Openwork](https://github.com/accomplish-ai/openwork) - MIT-licensed, open alternative to Anthropic's Cowork with multi-LLM support for browser automation.

  

## Papers

* [WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning](https://arxiv.org/html/2411.02337v1) (*Tsinghua U*) (11/24)

* [Anthropic Claude Computer Use API](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) (*Anthropic*) (10/24)

* [OmniParser for Pure Vision Based GUI Agent](https://microsoft.github.io/OmniParser/) ([code](https://github.com/microsoft/OmniParser)) (*Microsoft*) (08/24)

* [ECLAIR: Enterprise sCaLe AI for woRkflows](https://hazyresearch.stanford.edu/blog/2024-05-18-eclair)([code](https://github.com/HazyResearch/eclair-agents)) (*Stanford U*) (05/24)

* [OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments](https://os-world.github.io/) ([code](https://github.com/xlang-ai/OSWorld)) (*HKU*) (05/24)

* [Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs](https://arxiv.org/abs/2404.05719) ([code](https://github.com/apple/ml-ferret/tree/main/ferretui)) (*Apple*) (04/24)

* [SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded](https://osu-nlp-group.github.io/SeeAct/) ([code](https://github.com/OSU-NLP-Group/SeeAct)) (*OSU*) (01/24)

* [CogAgent: A Visual Language Model for GUI Agents](https://github.com/THUDM/CogVLM2) (*ZhiPu*)(12/23)

* [AppAgent: Multimodal Agents as Smartphone Users](https://appagent-official.github.io/) ([code](https://github.com/mnotgod96/AppAgent)) (*TenCent*) (12/23)

* [SoM : Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V](https://som-gpt4v.github.io/) ([code](https://github.com/microsoft/SoM)) (*Microsoft*) (10/23)

# Talks

* [LLMs as Computer Users: An Overview](https://www.figma.com/deck/rsWK4sRl0dOahG59bfMhql)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/philfung/awesome-computer-use

Awesome Lists containing this project

README