https://github.com/dortanes/atlas
An AI-powered computer-use agent built with Electron. Automate desktop tasks by letting AI see and interact with your OS.
https://github.com/dortanes/atlas
ai-powered computer-use computer-use-agent desktop-agent electron gemini openai
Last synced: 17 days ago
JSON representation
An AI-powered computer-use agent built with Electron. Automate desktop tasks by letting AI see and interact with your OS.
- Host: GitHub
- URL: https://github.com/dortanes/atlas
- Owner: dortanes
- License: apache-2.0
- Created: 2026-03-11T17:16:14.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-16T01:58:03.000Z (3 months ago)
- Last Synced: 2026-03-16T04:24:28.534Z (3 months ago)
- Topics: ai-powered, computer-use, computer-use-agent, desktop-agent, electron, gemini, openai
- Language: TypeScript
- Homepage:
- Size: 3.03 MB
- Stars: 2
- Watchers: 0
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
Atlas
AI agent that lives on your desktop.
It sees your screen, understands what you need, and gets things done — hands-free.

---
> **⚠️ Atlas is in active development (v0.2.3).**
>
> - 🤖 **LLM support:** Gemini (including native [Computer Use API](https://ai.google.dev/gemini-api/docs/computer-use)) and OpenAI. More providers on the way.
> - 🖥 **Screen control:** Gemini 3.x models use native [Computer Use API](https://ai.google.dev/gemini-api/docs/computer-use) for precise actions. Older models use vision-based coordinate prediction.
> - 💻 **Platform:** Windows only for now. macOS & Linux support is planned.
> - 🐛 **Found a bug?** We'd love to hear about it — [open an issue](https://github.com/dortanes/atlas/issues).
---
## What is Atlas?
Atlas is an **AI-powered desktop agent** that works alongside you as a transparent overlay. Press `Ctrl+Space`, tell it what to do — and it figures out the rest: navigating apps, clicking buttons, typing text, searching the web, finding files, running commands.
Think of it as a **copilot for your entire OS**.
- 🖥 **Sees your screen** — captures what's on your display and understands the context
- 🧠 **Thinks before it acts** — plans multi-step tasks and shows progress in real time
- 🖱 **Controls your computer** — mouse, keyboard, and terminal — all automated
- 🎯 **Shows what it's doing** — you can see the agent's cursor moving on screen
- 🔍 **Searches the web** — finds answers and brings them back, no tab-switching needed
- 📂 **Finds your files** — searches local files and folders by name, right from chat
- 🗣 **Speaks to you** — real-time voice responses with streaming TTS
- 🎙 **Listens to you** — local speech-to-text with wake word activation, no cloud required
- 🔊 **Sound feedback** — distinct sounds for every state: activation, processing, task complete, warnings
- 🛡 **Asks before doing anything risky** — built-in safety system with permission prompts
---
## ✨ Key Features
### 🔮 The Orb
A glowing AI indicator that shows you exactly what Atlas is doing — idle, thinking, acting, or waiting for your input. Always visible, never in the way.
### 🏝 Islands
Context-aware floating panels that appear when relevant:
- **Action Island** — shows the current task and progress
- **Response Island** — streams Atlas's thoughts and replies word by word
- **Permission Island** — asks for confirmation before risky operations
- **Microtask Island** — your task queue with real-time step progress (queue new tasks while the agent is busy)
- **Search Island** — web search results and local file search results
- **Listening Island** — live transcript display during voice input
- **Warning Island** — dismissable warnings for errors and quota issues
### 🎯 Agent Cursor
When Atlas controls your desktop, you can see its cursor moving on screen — clicking, typing, and scrolling — so you always know what's happening.
### 🖥 Computer Use
With compatible Gemini 3.x models, Atlas uses the native **[Computer Use API](https://ai.google.dev/gemini-api/docs/computer-use)** for precise screen control — clicking, typing, scrolling, navigating, and searching — all without opening extra apps. Multi-monitor setups are supported.
### 🧩 Smart Task Planning
Before executing complex commands, Atlas breaks them into high-level steps (2–5) and displays them in the Task Queue. You see planned steps before execution begins and watch progress as each step completes.
### 🎭 Personas
Create multiple AI agents with unique personalities, knowledge, and voices. Each persona has its own memory and prompt settings — switch between them from the tray menu.
### 🧠 Memory
Atlas remembers your preferences and context across sessions. It learns facts about you from conversations and uses them to give better responses over time. Browse conversation history and view, edit, or delete learned facts in Settings.
### 🎙 Voice Input
Local offline speech-to-text via Vosk — just say the wake word (the active persona's name) and Atlas starts listening. No cloud API required.
### ✍️ Editable Prompts
Full control over the AI's behavior — modify system, action, and safety prompts directly from the Settings UI. Reset to defaults anytime.
### ⚙️ Customizable Layout
Choose where Atlas appears on screen (left, right, or center) and configure your preferred activation hotkey — all from Settings.
### 🔧 Debug Logging
Enable per-request session logs to trace the full pipeline: intent classification → LLM calls → actions → response streaming — with precise timing for every stage.
---
## 🚀 Getting Started
### Download & Install
1. Go to [**Releases**](https://github.com/dortanes/atlas/releases) and download the latest installer for Windows
2. Run the installer — Atlas will appear in your system tray
3. Get a **Gemini API key**: go to [Google AI Studio](https://aistudio.google.com/apikey) → sign in → **Create API Key** → copy it
4. Click the **Atlas tray icon** → **Settings** → **Intelligence** tab → paste your API key
5. Set the recommended models in the **Intelligence** tab:
| Setting | Free tier | Paid tier |
|---------|-----------|-----------|
| **Text model** | `gemini-3.1-flash-lite-preview` | `gemini-3.1-flash-lite-preview` |
| **Vision model** | `gemini-3.1-flash-lite-preview` | `gemini-3-flash-preview` |
> Vision model handles screen control & Computer Use. Paid tier model is more accurate but requires a billing-enabled API key.
6. *(Optional)* For voice output:
- **Alice** (free, no API key): **Voice** tab → select **Alice** → done!
- **ElevenLabs** (premium voices): get an [ElevenLabs](https://elevenlabs.io/) API key → **Voice** tab → paste key + voice ID
7. Press `Ctrl+Space` and start giving Atlas tasks 🎉
### Build from Source
> For contributors and developers who want to run Atlas from source.
```bash
git clone https://github.com/dortanes/atlas.git
cd atlas
yarn install
yarn dev
```
> **Requires:** [Node.js](https://nodejs.org/) ≥ 20 · [Yarn](https://yarnpkg.com/) ≥ 1.22
---
## 🗺 Roadmap
| Status | Feature |
|:------:|---------|
| ✅ | Transparent glassmorphism overlay with Orb + Island UI |
| ✅ | LLM integration (Gemini + OpenAI) with multi-provider architecture |
| ✅ | Screen vision + desktop automation (robotjs) |
| ✅ | Native Gemini [Computer Use API](https://ai.google.dev/gemini-api/docs/computer-use) |
| ✅ | Smart task planning with step-by-step progress |
| ✅ | Agent cursor animations (click, type, scroll overlays) |
| ✅ | Streaming TTS (ElevenLabs + Alice) |
| ✅ | Persona system with isolated memory & custom voices |
| ✅ | Web search + local file search |
| ✅ | Settings UI with prompt editor + debug logging |
| ✅ | Intent classification (direct / action / chat) |
| ✅ | Context caching (Gemini prompt caching for token optimization) |
| ✅ | Voice input (wake word + local STT via Vosk) |
| 🔜 | Action whitelist/blacklist & audit log |
| 🔜 | Onboarding flow |
| 🔜 | Auto-update |
---
## ⭐ Support the Project
If you find Atlas useful, please consider giving the repository a **star** ⭐ — it helps others discover the project and motivates further development!
[](https://github.com/dortanes/atlas)
## 🤝 Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request.
## 📜 License
[Apache License 2.0](LICENSE) — use it, modify it, build on it.
---
Vibecoded with ❤️ by dortanes