An open API service indexing awesome lists of open source software.

https://github.com/francedot/interface-agent

InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
https://github.com/francedot/interface-agent

Last synced: 4 months ago
JSON representation

InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.

Awesome Lists containing this project

README

        

# Agent


TypeScript
Node 20 LTS
MIT License




InterfaceAgent Screenshot

# 🤔 What is InterfaceAgent?

Welcome to **InterfaceAgent**, a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.

Here are the key capabilities of **InterfaceAgent**:

- **Planning & Goal Refinement**: The agent is capable of constructing multi-step plans across various applications to fulfill user requests. It can also adapt and refine these plans based on user feedback during the evaluation phase.

- **Action Prediction (Pure Visual / Textual / Set-of-Mark Visual Prompting)**: InterfaceAgent employs a visual coordinate-based approach, pure DOM textual analysis, or set-of-marking to enhance the accuracy of predicting the next likely action.

- **Mixture of Models**: InterfaceAgent is compatible with both GPT-4V and Claude models, excelling in determining the subsequent steps directly from page screenshots.

- **Resilient Error Handling**: Recognizing that errors are an inherent part of AI Agents, InterfaceAgent incorporates a robust retry mechanism with exponential backoff. This allows it to skillfully navigate through temporary failures, ensuring the Agent's progress is uninterrupted.

**InterfaceAgent** OS-specific agents extend the core toolkit with advanced automation for the target platform:

- **Preview of iOS Agents:** Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your iOS device.
- **Preview of Windows Agents:** Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Windows 11 device.
- **Preview of Appium Android Agents (Coming soon):** Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Android device.
- **Playwright-based Web Agents (Coming soon):** Learn how to build Web AI Agent Companions.

## 💻 Getting Started

You can choose to either clone the repository or use npm, yarn, or pnpm to install InterfaceAgent.

- For Core, see [installation steps](./packages/core/README.md).
- For iOS, see [installation steps](./packages/ios/README.md).
- For Windows, see [installation steps](./packages/windows/README.md).

## 🎬 Demos

### Windows

```bash
1) User Query: Help me download an app named EdgeTile
```


EdgeTile demo

```bash
2) User Query: Dropshipping products on Tiktok
```


TikTok demo

### iOS

```bash
User Query: Help me prepare for a 30 days of fitness challenge
```


30 days of fitness demo

## 🚀 Challenges and Focus

InterfaceAgent continues to face challenges in long-horizon planning and selector inference accuracy. The current focus is on enhancing the stability of InterfaceAgent agents.

## 🤓 Contributing

We welcome contributions. Please follow the standard fork-and-pull request workflow for your contributions.

## 🛂 License

InterfaceAgent is licensed under the [MIT License](LICENSE).

## 🚑 Support

For support, questions, or feature requests, open an issue in the GitHub repository.