https://github.com/shiningflash/apache-airflow-workflow-manager

Simple Apache Airflow setup with both standalone and Docker-Compose workflows for quick orchestration experiments.
https://github.com/shiningflash/apache-airflow-workflow-manager

airflow airflow-dags apache-airflow dags docker python workflow

Last synced: 2 months ago
JSON representation

Simple Apache Airflow setup with both standalone and Docker-Compose workflows for quick orchestration experiments.

Host: GitHub
URL: https://github.com/shiningflash/apache-airflow-workflow-manager
Owner: shiningflash
License: mit
Created: 2025-07-01T23:31:43.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-07-01T23:38:46.000Z (about 1 year ago)
Last Synced: 2025-07-02T00:27:59.148Z (about 1 year ago)
Topics: airflow, airflow-dags, apache-airflow, dags, docker, python, workflow
Language: Python
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

# Apache Airflow Workflow Manager

Simple Apache Airflow setup with both standalone and Docker-Compose workflows for quick orchestration experiments.

---

## Getting Started with Standalone Setup

### Create Virtual Environment and Install Airflow

```bash
$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
````

### Run Airflow Standalone

```bash
$ airflow standalone
```

Access the Airflow UI at: [http://localhost:8080](http://localhost:8080)

Generated password is stored here:

```bash
$ cat /Users//airflow/simple_auth_manager_passwords.json.generated
```

---

## Install Apache Airflow using Docker-Compose

### Download Docker-Compose Setup

```bash
$ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.10.0/docker-compose.yaml'
```

### Review and Modify `docker-compose.yaml` (Optional)

```bash
$ less docker-compose.yaml
$ vim docker-compose.yaml
```

### Initialize Airflow

```bash
$ docker compose up airflow-init
```

### Start Airflow Services

```bash
$ docker compose up
```

Access the Airflow UI at: [http://localhost:8080](http://localhost:8080)

**Default Credentials:**

* Username: `airflow`
* Password: `airflow`

---

## Available DAGs and Their Workflow

| DAG Name | Description | File | Flow Summary |
|------------------------|-----------------------------------------------|-------------------------------------------|---------------------------------------------------|
| `wine_dataset_producer` | Downloads raw wine dataset from GitHub | `wine_dataset_producer.py` | `download → save → trigger Dataset` |
| `wine_dataset_consumer` | Cleans and persists wine data on trigger | `wine_dataset_consumer.py` | `clean → save cleaned → persist to SQLite` |
| `census_data_pipeline` | ETL pipeline using PythonOperators | `census_data_pipeline.py` | `download → transform → validate` |

### 1. `wine_dataset_producer`

**File:** [`wine_dataset_producer.py`](./dags/wine_dataset_producer.py)
**Purpose:** Fetches the raw wine ratings dataset from a GitHub URL and saves it to a local file as a Dataset.

**Tasks:**
* Download CSV from GitHub
* Save to: `~/airflow/datasets/raw_wine_dataset.csv`
* Publish `Dataset` trigger for downstream DAGs

---

### 2. `wine_dataset_consumer`

**File:** [`wine_dataset_consumer.py`](./dags/wine_dataset_consumer.py)
**Triggered by:** `wine_dataset_producer` via `Dataset`
**Purpose:** Reads, cleans, and persists the wine dataset using virtualenv tasks and SQLite.

**Tasks:**
* Clean dataset (drop `grape`, normalize line breaks)
* Save cleaned CSV to: `~/airflow/datasets/cleaned_wine_dataset.csv`
* Store full dataset and `notes` column into SQLite at: `~/airflow/databases/wine_dataset.db`

---

### 3. `census_data_pipeline`

**File:** [`census_data_pipeline.py`](./dags/census_data_pipeline.py)
**Purpose:** Demonstrates a basic ETL pipeline using Airflow's `PythonOperator`. It fetches census data from a public URL, filters the data, stores it in SQLite, and performs basic validation and statistics.

**Tasks:**
* **`download_data`**: Downloads raw census data from GitHub and stores it at `/tmp/city_census.csv`
* **`transform_data`**: Cleans missing values, filters rows where `weight > 200`, and saves output to both CSV and a SQLite table (`/tmp/census_data.db`)
* **`validate_and_statistics`**: Loads the data from SQLite and runs basic validation and summary stats (e.g., row count and total weight)
* **Flow:** `download_data → transform_data → validate_and_statistics`

---

## License

MIT License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shiningflash/apache-airflow-workflow-manager

Awesome Lists containing this project

README