https://github.com/francedot/interface-agent
InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
https://github.com/francedot/interface-agent
Last synced: 4 months ago
JSON representation
InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
- Host: GitHub
- URL: https://github.com/francedot/interface-agent
- Owner: francedot
- License: mit
- Created: 2024-02-02T18:28:36.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-01T18:40:17.000Z (12 months ago)
- Last Synced: 2024-12-14T00:39:17.540Z (5 months ago)
- Language: TypeScript
- Homepage:
- Size: 10 MB
- Stars: 107
- Watchers: 8
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#
Agent
![]()
# 🤔 What is InterfaceAgent?
Welcome to **InterfaceAgent**, a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
Here are the key capabilities of **InterfaceAgent**:
- **Planning & Goal Refinement**: The agent is capable of constructing multi-step plans across various applications to fulfill user requests. It can also adapt and refine these plans based on user feedback during the evaluation phase.
- **Action Prediction (Pure Visual / Textual / Set-of-Mark Visual Prompting)**: InterfaceAgent employs a visual coordinate-based approach, pure DOM textual analysis, or set-of-marking to enhance the accuracy of predicting the next likely action.
- **Mixture of Models**: InterfaceAgent is compatible with both GPT-4V and Claude models, excelling in determining the subsequent steps directly from page screenshots.
- **Resilient Error Handling**: Recognizing that errors are an inherent part of AI Agents, InterfaceAgent incorporates a robust retry mechanism with exponential backoff. This allows it to skillfully navigate through temporary failures, ensuring the Agent's progress is uninterrupted.
**InterfaceAgent** OS-specific agents extend the core toolkit with advanced automation for the target platform:
- **Preview of iOS Agents:** Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your iOS device.
- **Preview of Windows Agents:** Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Windows 11 device.
- **Preview of Appium Android Agents (Coming soon):** Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Android device.
- **Playwright-based Web Agents (Coming soon):** Learn how to build Web AI Agent Companions.## 💻 Getting Started
You can choose to either clone the repository or use npm, yarn, or pnpm to install InterfaceAgent.
- For Core, see [installation steps](./packages/core/README.md).
- For iOS, see [installation steps](./packages/ios/README.md).
- For Windows, see [installation steps](./packages/windows/README.md).## 🎬 Demos
### Windows
```bash
1) User Query: Help me download an app named EdgeTile
```
![]()
```bash
2) User Query: Dropshipping products on Tiktok
```
![]()
### iOS
```bash
User Query: Help me prepare for a 30 days of fitness challenge
```
![]()
## 🚀 Challenges and Focus
InterfaceAgent continues to face challenges in long-horizon planning and selector inference accuracy. The current focus is on enhancing the stability of InterfaceAgent agents.
## 🤓 Contributing
We welcome contributions. Please follow the standard fork-and-pull request workflow for your contributions.
## 🛂 License
InterfaceAgent is licensed under the [MIT License](LICENSE).
## 🚑 Support
For support, questions, or feature requests, open an issue in the GitHub repository.