An open API service indexing awesome lists of open source software.

https://github.com/agitmit/webpilot

WebPilot is a powerful, open-source, self-hosted tool that wraps headless browsers in a robust REST API, enabling seamless remote browser automation, page session management, in-memory isolation and more.
https://github.com/agitmit/webpilot

automation chromium headless-browsers open-source puppeteer remote-control rest-api scraping web webscraping

Last synced: 21 days ago
JSON representation

WebPilot is a powerful, open-source, self-hosted tool that wraps headless browsers in a robust REST API, enabling seamless remote browser automation, page session management, in-memory isolation and more.

Awesome Lists containing this project

README

        

# đŸŒđŸ•šī¸ WebPilot

**Version:** 0.1.0

WebPilot is an advanced, developer-focused tool designed to redefine how we control and interact with the web programmatically. Built to facilitate robust and scalable headless browser operations, WebPilot provides unmatched flexibility and control over web automation and browser management.

---

## **Overview**

WebPilot is a sophisticated tool for managing headless browser instances and performing web automation tasks. It offers a RESTful API interface, making it easy for developers to integrate powerful browser-based operations into their applications. Whether you need to automate complex web interactions, scrape data, or test web applications, WebPilot is equipped to handle it efficiently and scalably.

### **Key Features**
- **Dynamic Browser Pool Management**: Efficiently manage and scale pools of headless browser instances to optimize performance and resource usage.
- **Configuration Flexibility**: Spin up browser instances with custom configurations such as user-agent strings, geolocation, proxy settings, and more.
- **Stateful Page Sessions**: Control stateful page-sessions remotely, stored in-memory.
- **Performance Monitoring**: Monitor browser performance metrics to ensure stability and responsiveness.
- **RESTful API**: Interact with WebPilot using a REST API for seamless integration into your workflow.

---

## **Core Components**

### **Browser Pool**
The browser pool dynamically manages the lifecycle of headless browser instances, ensuring optimal performance under varying loads. This component supports:
- Auto-scaling: Adjusts the number of browser instances based on demand.
- Configuration templates: Allows custom configurations for each instance.
- Efficient resource allocation: Ensures each browser instance operates within capacity to prevent performance degradation.

### **Page Session**
The Page Session is a critical component of WebPilot, designed to encapsulate the lifecycle of individual web pages within a browser instance.
It provides a robust mechanism for managing page-specific tasks, ensuring isolation, stability, and seamless interaction with the browser.
- Task Isolation: Each page session operates independently, ensuring that tasks on one page do not interfere with others within the same browser instance.
- In-Memory Operations: All interactions and operations on a page session occur in memory, enhancing speed and reducing latency.
- Efficient Cleanup: Page sessions are cleaned up automatically after completion, freeing resources and preventing memory leaks.
- Error Handling: Isolates errors to individual page sessions, preventing a single failure from impacting other pages or browser instances.
- Monitoring and Metrics: Track performance metrics at the page level, such as load times, memory usage, and the number of active connections.
- Concurrency: Supports multiple concurrent page sessions per browser, with intelligent load balancing to optimize performance without overwhelming the instance.

### **Performance Metrics**
Comprehensive metrics are collected and monitored, including:
- CPU and memory usage per browser instance.
- Active pages and overall task load.
- System-level performance indicators.

---

## **Use Cases**

- **Web Automation**: Automate complex workflows like filling out forms, interacting with JavaScript-heavy pages, or navigating multi-step processes.
- **Data Scraping**: Extract structured data from websites, even those protected by anti-bot mechanisms.
- **End-to-End Testing**: Perform comprehensive tests on web applications in realistic environments with varying configurations.
- **Load Testing**: Simulate real-world traffic with multiple browsers and user sessions.

---

## **System Requirements**
- **Python Version**: >=3.12.
- **Dependencies**: Either use `pip` to install the project or using [Poetry](https://python-poetry.org/) for streamlined package management and virtual environment handling; WebPilot support both methods.
- **OS**: Linux-based systems are recommended for optimal compatibility with headless Chromium.

---

## **Contributing**
Contributions are welcome! To contribute:
1. Fork the repository and create your branch.
2. Implement your feature or fix and write tests.
3. Submit a pull request for review.

Make sure to follow the code style guidelines and include detailed documentation for new features.

---

## **License**
WebPilot is licensed under the [MIT License](https://opensource.org/licenses/MIT), making it free for both personal and commercial use.

---

## **Acknowledgments**
Special thanks to the open-source community and the creators of foundational libraries like FastAPI, Pyppeteer, and the Chromium engine which together power WebPilot's core functionality.

For any inquiries or support, feel free to reach out via the project repository. 🚀