An open API service indexing awesome lists of open source software.

https://github.com/philfung/awesome-computer-use

Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
https://github.com/philfung/awesome-computer-use

List: awesome-computer-use

anthropic anthropic-claude computer-use computer-vision gpt-4-vision gui-agents llm rpa rpa-robotic-process-automation tool-use vision

Last synced: 17 days ago
JSON representation

Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.

Awesome Lists containing this project

README

          

# Awesome Computer Use
Curated list of papers + libraries related to computer GUI use via LLMs.\
Highly opinionated, focus on quality vs quantity.

## Demos
* Try [computer use on your Mac](https://github.com/philfung/computer-use) in one click.

## Frameworks
* [Openwork](https://github.com/accomplish-ai/openwork) - MIT-licensed, open alternative to Anthropic's Cowork with multi-LLM support for browser automation.

## Papers
* [WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning](https://arxiv.org/html/2411.02337v1) (*Tsinghua U*) (11/24)
* [Anthropic Claude Computer Use API](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) (*Anthropic*) (10/24)
* [OmniParser for Pure Vision Based GUI Agent](https://microsoft.github.io/OmniParser/) ([code](https://github.com/microsoft/OmniParser)) (*Microsoft*) (08/24)
* [ECLAIR: Enterprise sCaLe AI for woRkflows](https://hazyresearch.stanford.edu/blog/2024-05-18-eclair)([code](https://github.com/HazyResearch/eclair-agents)) (*Stanford U*) (05/24)
* [OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments](https://os-world.github.io/) ([code](https://github.com/xlang-ai/OSWorld)) (*HKU*) (05/24)
* [Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs](https://arxiv.org/abs/2404.05719) ([code](https://github.com/apple/ml-ferret/tree/main/ferretui)) (*Apple*) (04/24)
* [SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded](https://osu-nlp-group.github.io/SeeAct/) ([code](https://github.com/OSU-NLP-Group/SeeAct)) (*OSU*) (01/24)
* [CogAgent: A Visual Language Model for GUI Agents](https://github.com/THUDM/CogVLM2) (*ZhiPu*)(12/23)
* [AppAgent: Multimodal Agents as Smartphone Users](https://appagent-official.github.io/) ([code](https://github.com/mnotgod96/AppAgent)) (*TenCent*) (12/23)
* [SoM : Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V](https://som-gpt4v.github.io/) ([code](https://github.com/microsoft/SoM)) (*Microsoft*) (10/23)

# Talks
* [LLMs as Computer Users: An Overview](https://www.figma.com/deck/rsWK4sRl0dOahG59bfMhql)