An open API service indexing awesome lists of open source software.

https://github.com/pesnik/airflower

Airflower adds intelligent decision-making to Apache Airflow by capturing real-time metadata through listeners and sending it to an external brain via gRPC. The brain, equipped with analytical and rule engines, processes the data and sends decisions back to Airflow to optimize task execution dynamically.
https://github.com/pesnik/airflower

airflow dask data-engineering etl grpc machine-learning pandas python rabbitmq tensorflow

Last synced: about 1 month ago
JSON representation

Airflower adds intelligent decision-making to Apache Airflow by capturing real-time metadata through listeners and sending it to an external brain via gRPC. The brain, equipped with analytical and rule engines, processes the data and sends decisions back to Airflow to optimize task execution dynamically.

Awesome Lists containing this project

README

          

### 1. Description
**Project Name:** Airflower

**Description:**
Airflower is an advanced extension for Apache Airflow designed to bring intelligent decision-making capabilities to your workflows. By leveraging Airflow's hooks and listeners, Airflower captures real-time metadata and sends it to a powerful external brain via gRPC. This brain consists of an analytical engine and a rule engine, which analyze the data and provide actionable decisions. These decisions are then relayed back to Airflow, allowing it to dynamically adapt and optimize task execution. Airflower aims to make workflows more intelligent, efficient, and responsive to changing conditions.

### 2. Technologies
- **Apache Airflow**: For orchestrating workflows and using its hook/listener capabilities.
- **gRPC**: For high-performance communication between Airflow and the external brain.
- **Python**: As the primary language for developing the project and writing custom hooks in Airflow.
- **Analytical Engine**:
- **Pandas** and **Dask**: For data manipulation and analysis.
- **TensorFlow**: For implementing machine learning models if required.
- **Rule Engine**:
- **Drools**/**Pyke**: For defining and executing business rules.
- **Containerization**:
- **Docker**: To containerize the components for consistent environments.
- **Orchestration**:
- **Kubernetes**: To manage the deployment and scaling of the system.
- **Monitoring**:
- **Prometheus** and **Grafana**: For monitoring and visualizing system performance.
- **Database**:
- **PostgreSQL**: To store metadata, logs, and decision history.

### 3. Architectural Diagram

![Airflower Sequence Diagram](docs/airflower_sequence.png)

![Airflower Architectural Diagram](docs/airflower_arch.jpeg)