https://github.com/dev-vivekkumarverma/airflow
hey, I am learning Apache Airflow
https://github.com/dev-vivekkumarverma/airflow
apache-airflow python3
Last synced: 2 months ago
JSON representation
hey, I am learning Apache Airflow
- Host: GitHub
- URL: https://github.com/dev-vivekkumarverma/airflow
- Owner: dev-vivekkumarverma
- Created: 2025-01-27T12:31:21.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-27T12:38:29.000Z (4 months ago)
- Last Synced: 2025-01-27T13:46:18.852Z (4 months ago)
- Topics: apache-airflow, python3
- Homepage:
- Size: 1000 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![]()
# airflow
For tutorial:https://www.youtube.com/watch?v=K9AnJ9_ZAXE&list=PLwFJcsJ61oujAqYpMp1kdUBcPG0sE0QMT
1. What is `airflow` and why do we `need` it ?
- Airflow is a `workflow orchestration platform` that allows users to programmatically create, schedule, and monitor workflows.
It's often used to automate machine learning tasks and create complex `data pipelines`.Here's a well-structured guide for setting up Apache Airflow in a **virtual environment** and using the **current working directory (`.`) as `AIRFLOW_HOME`**.
I'll also highlight the **commands that should go in `README.md`** for easy reference.
---
# **🚀 Apache Airflow Local Setup Guide (Using Virtual Environment & Local Directory)**
## **📌 Overview**
This guide covers:
✅ Installing Airflow inside a **Python virtual environment**
✅ Using the **current directory (`.`) as `AIRFLOW_HOME`**
✅ Running **Airflow webserver and scheduler**
✅ Managing **DAGs and users**---
## **🛠️ Prerequisites**
Ensure you have:
- **Python 3.11** installed
- **pip, venv, and other required system packages**
- **Enough disk space and proper permissions**---
## **1️⃣ Install Dependencies (Before Setup)**
### ✅ **For Ubuntu/Debian**:
```bash
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-distutils python3-pip -y
```### ✅ **For macOS (Using Homebrew)**:
```bash
brew install [email protected]
```### ✅ **For Windows**:
1. Download **Python 3.11** from [python.org](https://www.python.org/downloads/).
2. During installation, **check the box**: **"Add Python to PATH"**.
3. Open **PowerShell** and run:
```powershell
python -m ensurepip
```---
## **2️⃣ Set Up Virtual Environment**
🚀 **(Add These Commands to README.md)**
```bash
# Navigate to your project directory
cd ~/your-project-folder # Change this to your actual folder# Create and activate a virtual environment
python3.11 -m venv airflow-env
source airflow-env/bin/activate # Linux/macOS
airflow-env\Scripts\activate # Windows# Verify Python version inside the virtual environment
python --version
```---
## **3️⃣ Set `AIRFLOW_HOME` to Current Directory (`.`)**
🚀 **(Add These Commands to README.md)**
```bash
# Set Airflow to use the current directory
export AIRFLOW_HOME=$(pwd) # Linux/macOS
set AIRFLOW_HOME=%cd% # Windows# Add this to your ~/.bashrc or ~/.zshrc to make it persistent
echo 'export AIRFLOW_HOME=$(pwd)' >> ~/.bashrc
source ~/.bashrc
```---
## **4️⃣ Install Apache Airflow**
🚀 **(Add These Commands to README.md)**
```bash
pip install --upgrade pip
pip install apache-airflow==2.7.1
```✅ **Verify Installation**
```bash
airflow version
```---
## **5️⃣ Initialize Airflow Database**
🚀 **(Add These Commands to README.md)**
```bash
airflow db init
```
This will create:
- `airflow.cfg` → Airflow configuration file
- `airflow.db` → SQLite database (for local use)✅ **Check if the files are created in the current directory**:
```bash
ls -l | grep airflow
```---
## **6️⃣ Start Airflow Webserver and Scheduler**
🚀 **(Add These Commands to README.md)**
```bash
# Start the Airflow web server (Runs on port 8080 by default)
airflow webserver --port 8080# Open in browser: http://localhost:8080
# In a separate terminal, start the scheduler
airflow scheduler
```---
## **7️⃣ Create an Admin User**
🚀 **(Add These Commands to README.md)**
```bash
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email [email protected]
```
🛠 **Now, log in to the Airflow UI at** `http://localhost:8080` **with the admin credentials.**---
## **8️⃣ Add DAGs to `dags/` Directory**
🚀 **(Add These Commands to README.md)**
```bash
mkdir -p dags
```
- Place your DAG Python files inside the `dags/` directory.
- Example DAG (`dags/example_dag.py`):
```python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetimewith DAG('example_dag', start_date=datetime(2024, 1, 1), schedule_interval="@daily") as dag:
start = DummyOperator(task_id="start")```
✅ **Activate DAGs in UI**:
1. Start the scheduler:
```bash
airflow scheduler
```
2. Enable the DAG in **Airflow UI (`http://localhost:8080`)**.---
## **9️⃣ Stop Airflow and Deactivate Virtual Environment**
🚀 **(Add These Commands to README.md)**
```bash
# Stop Airflow (Find and kill processes)
pkill -f "airflow webserver"
pkill -f "airflow scheduler"# Deactivate virtual environment
deactivate
```---
# **🚀 Summary of Commands for README.md**
```bash
# Install dependencies
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-distutils python3-pip -y# Create and activate virtual environment
python3.11 -m venv airflow-env
source airflow-env/bin/activate # (Linux/macOS)
airflow-env\Scripts\activate # (Windows)# Set AIRFLOW_HOME to current directory
export AIRFLOW_HOME=$(pwd)
echo 'export AIRFLOW_HOME=$(pwd)' >> ~/.bashrc
source ~/.bashrc# Install Apache Airflow
pip install --upgrade pip
pip install apache-airflow==2.7.1# Initialize Airflow database
airflow db init# Start Airflow webserver and scheduler (run in separate terminals)
airflow webserver --port 8080
airflow scheduler# Create an admin user
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email [email protected]# Create DAGs directory
mkdir -p dags# Stop Airflow and deactivate environment
pkill -f "airflow webserver"
pkill -f "airflow scheduler"
deactivate
```
---# WORKFLOW IS SOMETHING LIKE
`WORKFLOW -> DAG(when to do what) -> TASK(what to do) -> OPERATOR (how to do)`

`DAG, tasks and operators`

`DAG, TASK AND OPERATORs internals`
# `TAST LIFECYCLE`
