https://github.com/aaronlab/copilot-computer-use
The world's first free Computer Use agent powered by GitHub Copilot API. Screenshot → Vision Analysis → Action Execution, all through your Copilot subscription.
https://github.com/aaronlab/copilot-computer-use
ai-agent computer-use copilot copilot-api desktop-automation free github-copilot open-source pyautogui vision
Last synced: 11 days ago
JSON representation
The world's first free Computer Use agent powered by GitHub Copilot API. Screenshot → Vision Analysis → Action Execution, all through your Copilot subscription.
- Host: GitHub
- URL: https://github.com/aaronlab/copilot-computer-use
- Owner: aaronlab
- License: mit
- Created: 2026-03-24T09:16:12.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-24T09:51:37.000Z (3 months ago)
- Last Synced: 2026-05-09T15:53:22.212Z (about 1 month ago)
- Topics: ai-agent, computer-use, copilot, copilot-api, desktop-automation, free, github-copilot, open-source, pyautogui, vision
- Language: Python
- Size: 51.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# copilot-computer-use
**English** | [中文](README_zh.md)
> **The world's first free Computer Use agent powered by GitHub Copilot API.**
Screenshot → Vision Analysis → Action Execution — all through your Copilot subscription ($10/mo or free for students).
## Why?
Every Computer Use agent today requires expensive Vision API calls ($$$). We discovered that **GitHub Copilot API supports base64 image input** via the `copilot-vision-request: true` header. This means you can build a fully functional Computer Use agent at **zero additional cost** beyond your existing Copilot subscription.
## How It Works
```
┌─────────────────────────────────────────────────┐
│ Agent Control Loop │
│ │
│ 1. 📸 Capture screenshot (mss + Pillow) │
│ 2. 🔄 Encode to base64 PNG │
│ 3. 👁️ Analyze via Copilot Vision API │
│ (copilot-vision-request: true header) │
│ 4. 🧠 Plan next action via Copilot Text API │
│ 5. ⚡ Execute action (pyautogui) │
│ 6. 🔁 Repeat until task complete │
└─────────────────────────────────────────────────┘
↕ All API calls go through ↕
┌─────────────────────────────────────────────────┐
│ Copilot API Adapter │
│ • GitHub OAuth Device Flow authentication │
│ • Automatic JWT token refresh │
│ • VS Code version spoofing │
│ • copilot-vision-request header management │
└─────────────────────────────────────────────────┘
```
## Key Innovation
| Aspect | Traditional Agent | copilot-computer-use |
|--------|-------------------|---------------------|
| Vision API Cost | $0.01-0.05/screenshot | **$0 (included in Copilot)** |
| Text Reasoning Cost | $0.003-0.06/request | **$0 (included in Copilot)** |
| Monthly Cost | $20-100+ | **$10/mo (Copilot) or free** |
| Models Available | 1 provider | GPT-4o, Claude, Gemini |
## Prerequisites
- Python 3.11+
- GitHub account with Copilot subscription (Individual, Business, or Enterprise)
- macOS or Linux (Windows support planned)
## Quick Start
```bash
# Clone
git clone https://github.com/Zey413/copilot-computer-use.git
cd copilot-computer-use
# Install
pip install -e .
# Authenticate with GitHub (one-time)
python -m src.copilot.auth
# Run
python -m src.main "Open Chrome and search for the weather"
```
## Architecture
```
src/
├── main.py # Entry point + CLI
├── copilot/
│ ├── auth.py # GitHub OAuth Device Flow + JWT refresh
│ ├── client.py # Copilot chat/completions API client
│ └── config.py # API endpoints, headers, version spoofing
├── agent/
│ ├── loop.py # Core agent control loop
│ ├── planner.py # Task planning via text reasoning
│ └── actions.py # Action definitions (click, type, scroll...)
├── screen/
│ ├── capture.py # Screenshot capture (mss + Pillow)
│ └── annotate.py # Optional: annotate screenshots with markers
└── executor/
├── base.py # Abstract executor interface
├── macos.py # macOS executor (pyautogui + AppleScript)
└── linux.py # Linux executor (pyautogui + xdotool)
```
## How Copilot Vision Works
The key discovery: Copilot's `chat/completions` API accepts base64-encoded images when you include the `copilot-vision-request: true` header:
```python
import httpx
response = httpx.post(
"https://api.githubcopilot.com/chat/completions",
headers={
"Authorization": f"Bearer {copilot_jwt}",
"copilot-vision-request": "true",
"editor-version": "vscode/1.104.3",
# ... other headers
},
json={
"model": "gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What do you see on this screen?"},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{screenshot_b64}"
}}
]
}]
}
)
```
## Research Background
This project is based on extensive research documented in Reviews #001-#013:
- **Raven** ([nocoo/raven](https://github.com/nocoo/raven)): Proved Copilot API can be proxied with full Anthropic↔OpenAI translation
- **Copilot Vision**: Confirmed base64 support via 6+ open-source projects (LiteLLM, OpenCode, LobeHub, etc.)
- **Market Gap**: Zero existing projects combine Copilot proxy + Computer Use
## Supported Models (via Copilot)
- GPT-4o / GPT-4.1 (Vision ✅)
- Claude Sonnet 4 / Opus 4 (Vision ✅)
- Gemini 2.5 Pro / 3 Flash (Vision ✅)
- o4-mini (Reasoning)
- And more as GitHub adds them
## Limitations
- **Experimental**: This is a research project, not production software
- **Rate Limits**: Copilot has rate limits; aggressive screenshot loops may be throttled
- **ToS Gray Area**: Using Copilot API beyond IDE integration may not be explicitly permitted
- **Image Formats**: PNG is most reliable; JPEG works; WebP may have issues
- **macOS/Linux only**: Windows executor not yet implemented
## Acknowledgments
- [Raven](https://github.com/nocoo/raven) — Inspiration for Copilot API authentication and format translation
- [self-operating-computer](https://github.com/OthersideAI/self-operating-computer) — Reference for lightweight Computer Use architecture
- [copilot-api](https://github.com/ericc-ch/copilot-api) — Early Copilot reverse engineering work
## License
MIT