An open API service indexing awesome lists of open source software.

https://github.com/k-l-lambda/anthroid

An agentic android app
https://github.com/k-l-lambda/anthroid

Last synced: 3 months ago
JSON representation

An agentic android app

Awesome Lists containing this project

README

          

# Anthroid


Anthroid logo

Anthroid is a native Android app for **mobile agentic workflows**: a Claude Code-style agent that can work with your phone’s native capabilities (camera, voice, clipboard, notifications, app/URL launching, location/calendar, screenshots, and optional UI automation).

## Demos

### Cross-App Automation
Launch other apps and automate multi-step tasks. Here the agent opens a shopping app, searches for products, and navigates the interface.


Cross-app automation demo

### Remote Server Monitoring
Use voice commands to check server status. The agent connects via SSH, runs diagnostics, and reports back.


Server monitoring demo

### Document Analysis & Local File Management
Take a photo or pick from gallery, then ask the agent to analyze. Here it reads insurance documents and creates a comparison report. The agent can also organize, rename, and move files on your device β€” tasks that are typically cumbersome on mobile due to limited input methods.


Document analysis demo

## When it helps

- **On-call / incident triage away from your desk**: run quick diagnostics in a terminal, keep a chat thread with context, and resume the conversation later.
- **Field work & device setup**: scan QR codes to avoid retyping tokens/URLs, open links on-device, and copy/paste between apps.
- **Hands-busy workflows**: dictate messages with **offline** speech recognition so you can keep moving even with poor network.
- **"What am I looking at?" troubleshooting**: attach photos/screenshots for visual context, then follow a short step-by-step checklist.
- **Phone-use automation**: copy key info to clipboard, open deep links, launch apps, set reminders via notifications, and combine device context (location/calendar) with instructions.

## Overview

Anthroid (Android + Anthropic) is a mobile implementation of [Claude Code](https://docs.anthropic.com/claude-code), designed around mobile-native input methods and device capabilities.

### Architecture Comparison

Unlike typical mobile AI apps that merely serve as a frontend to remote agent systems, Anthroid runs the **full agent logic locally** on your device:

Typical Mobile AI Apps
Anthroid

```mermaid
flowchart TB
subgraph phone["πŸ“± Mobile Device"]
ui["UI Only"]
end
subgraph server["☁️ Remote Server"]
agent["Agent Logic"]
tools["Tool Executor"]
end
subgraph cloud["☁️ LLM Provider"]
api["LLM API"]
end
ui <-->|"Every action"| agent
agent --> tools
agent <--> api
```

❌ Tools run on remote server

❌ Limited device access

❌ Network latency for every action

```mermaid
flowchart TB
subgraph phone["πŸ“± Mobile Device"]
subgraph runtime["Agent Runtime"]
tools["Tool Executor
Bash / Files / Camera / Voice"]
end
end
subgraph cloud["☁️ Cloud"]
api["LLM API
(Claude / OpenAI compatible)"]
end
runtime <-->|"Inference requests"| api
```

βœ… Tools execute locally

βœ… Full device access

βœ… Only LLM inference calls remote API

βœ… Supports any OpenAI-compatible endpoint

This means Anthroid can directly access your camera, files, clipboard, and other device capabilities without round-tripping through a remote server.

**Design goals**
- **Device tools**: access to location, calendar, clipboard, notifications, URL/app launching, and more.
- **Mobile input**: voice dictation (offline `sherpa-onnx`) and camera capture/QR scanning.
- **Agent-style automation**: optional overlay shows what the agent is doing when it operates outside the app.
- **Terminal at hand**: a full bash environment for advanced workflows.

## Features

### Chat Interface
- Native Android chat UI with streaming responses
- **Markdown rendering** - Tables, bold, italic, code blocks, links
- **Conversation history** - Resume past conversations
- Light blue user bubbles, gray assistant bubbles

### Voice Input
- **Offline speech recognition** using sherpa-onnx
- Supports Chinese, English, Japanese, Korean, Cantonese
- Press-and-hold microphone button to speak
- Real-time transcription display

### Camera & Vision
- Take photos to add visual context to messages
- Gallery picker for existing images
- Multiple images per message
- **QR code scanning** with instant text insertion and clipboard copy

### AI Agent Tools
Claude can execute tools on your device:

| Tool | Description |
|------|-------------|
| Bash | Run terminal commands |
| Read/Write/Edit | File operations |
| Glob/Grep | Search files and content |
| Notification | Show Android notifications |
| Clipboard | Read/write clipboard |
| Open URL | Launch browser |
| Launch App | Open installed apps |
| Location | Get GPS coordinates |
| Calendar | Query calendar events |
| Screenshot | Capture device screen |
| Screen Tap/Swipe | UI automation |

### Screen Automation Overlay
When Claude launches other apps or performs actions outside Anthroid:
- **Floating banner** appears at the top of screen showing agent status
- **Streaming text** displays what Claude is currently doing
- **Stop button** to cancel the operation at any time
- **Auto-hides** after task completion (tap to return to Anthroid)
- Requires overlay permission (Draw over other apps)

### OpenClaw Agent Runtime (v1.0)
Local [OpenClaw](https://github.com/k-l-lambda/openclaw) agent with full tool-use, model selection, and context management:
- **pi-embedded-runner** bundled in APK β€” no external agent server needed
- **60+ Android tools** via MCP bridge (accessibility, camera, clipboard, notifications, etc.)
- **File-based memory** β€” `MEMORY.md` + `memory/*.md` injected into system prompt, persists across sessions
- **Any LLM provider** β€” Anthropic, OpenAI-compatible APIs, or proxy endpoints (e.g. PPIO)

### Gateway Integration (Optional)
Connect to an OpenClaw gateway server for distributed agent features:
- **Memory sync** β€” automatic pull/push of `memory/` files on session start/end (git-based incremental)
- **Profile sync** β€” `/sync-profile` command pulls agent identity, skills, and config files
- **Remote agent view** β€” view and interact with OpenClaw sessions and SSH+tmux sessions on remote servers
- **Pending message delivery** β€” 60s polling for timed messages from gateway agents
- **System notifications** β€” IM-like notifications for gateway messages when app is backgrounded
- **Gateway Settings UI** β€” configure host/port/token in Settings, or use `/set-gateway` command / QR code

### Quick Send Candidates
- Frequently used short messages appear as chips above the input bar
- **Tap** to send immediately, **long-press** to insert into input for editing
- Automatically tracked by frequency (threshold: 5 uses)

### Terminal Environment
Built-in Linux terminal for advanced users:
- Full bash shell environment
- Package manager (apt/pkg)
- Node.js, Python, and more available

## Installation

### Download APK
Get the latest release from [GitHub Releases](https://github.com/k-l-lambda/anthroid/releases).

### Build from Source

```bash
# Clone repository
git clone https://github.com/k-l-lambda/anthroid.git
cd anthroid

# Build debug APK
./gradlew assembleDebug

# Install on device
adb install app/build/outputs/apk/debug/anthroid-app_apt-android-7-debug_arm64-v8a.apk
```

### Requirements
- Android 7.0+ (API 24)
- ARM64 device recommended
- ~200MB storage (+ optional 239MB for voice model)

## Setup

### API Configuration
1. Get your Claude API key from [Anthropic Console](https://console.anthropic.com/)
2. In Anthroid, go to **Settings** > **API Configuration**
3. Enter your API key and base URL

Or use QR code for quick setup:
1. Generate QR code with API credentials
2. Open Camera > QR scan mode
3. Scan the QR code

### Voice Input Setup
1. Go to **Settings** > **Components**
2. Download ASR Model (239MB, one-time)
3. Wait for model initialization
4. Microphone button appears in chat

### Gateway Setup (Optional)
Connect to an OpenClaw gateway for memory sync and remote agent features:
1. Go to **Settings** > **Gateway Connection**
2. Enter host, port, and token
3. Enable the toggle

Or use the chat command: `/set-gateway [token]`

Or scan a QR code generated from `tools/qr-generator.html`.

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Anthroid App (Android) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Kotlin Layer β”‚ β”‚
β”‚ β”‚ β”œβ”€ Chat UI (Markdown, voice, camera) β”‚ β”‚
β”‚ β”‚ β”œβ”€ ClaudeViewModel (CLI/API/OpenClaw) β”‚ β”‚
β”‚ β”‚ β”œβ”€ MCP Server (NanoHTTPD :8765) β”‚ β”‚
β”‚ β”‚ β”œβ”€ Gateway Manager (WebSocket + sync) β”‚ β”‚
β”‚ β”‚ └─ AndroidTools (60+ device tools) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ HTTP localhost:8765 β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Node.js Layer (OpenClaw Agent) β”‚ β”‚
β”‚ β”‚ β”œβ”€ pi-embedded-runner (agent runtime) β”‚ β”‚
β”‚ β”‚ β”œβ”€ android-tools-bridge.mjs (MCP bridge) β”‚ β”‚
β”‚ β”‚ └─ workspace/ (memory, skills, data) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Terminal (Termux fork) β”‚ β”‚
β”‚ β”‚ bash, Node.js, Python, apt/pkg β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β–Ό β–Ό
☁️ LLM API ☁️ OpenClaw Gateway
(Claude/OpenAI) (optional, WebSocket)
```

### Key Technologies
- **Kotlin** - Primary language
- **OpenClaw pi-embedded-runner** - Local agent runtime with tool-use
- **CameraX** - Camera capture
- **ML Kit** - QR code scanning
- **sherpa-onnx** - Offline speech recognition
- **Markwon** - Markdown rendering
- **OkHttp** - WebSocket for gateway connection
- **BouncyCastle** - Ed25519 device authentication

## License

GPLv3 - Same license as Termux. See [LICENSE.md](LICENSE.md).

## Credits

- [Termux](https://github.com/termux/termux-app) - Terminal emulator foundation
- [Anthropic](https://anthropic.com) - Claude AI
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) - Speech recognition
- [Markwon](https://github.com/noties/Markwon) - Markdown rendering