An open API service indexing awesome lists of open source software.

https://github.com/waridrox/weather-etl-airflow


https://github.com/waridrox/weather-etl-airflow

Last synced: about 1 year ago
JSON representation

Awesome Lists containing this project

README

          

# Data Engineering Project: Automated ETL Pipeline with Apache Airflow
data flow dia

In this project, I built and automate an ETL process that extracts current weather data from the Open Weather Map API, transforms it, and loads the csv results obtained into an AWS S3 bucket. The entire project runs on the AWS cloud platform and leverages Apache Airflow to orchestrate and schedule the workflows.

airflow functions

## Key Concepts

- **ETL Process**:
- **Extract**: Pull weather data from the Open Weather Map API.
- **Transform**: Clean and prepare the data.
- **Load**: Upload the transformed data into an AWS S3 bucket.

- **Apache Airflow**:
- **DAG (Directed Acyclic Graph)**: Represents the workflow structure.
- **Operators**: Tasks that define each step of the pipeline.
- **Sensors**: Special operators that wait for a particular condition or event before proceeding.

- **AWS Cloud Platform**:
- Utilized AWS S3 for data storage.

## Project Workflow

1. **Environment Setup**
- Install Apache Airflow.
- Configure AWS credentials and Openweathermap API key.

2. **Extract Data**
- Connect to the Open Weather Map API to fetch current weather data.

3. **Transform Data**
- Process and clean the data to match requirements.

4. **Load Data**
- Upload the transformed data into an S3 bucket on AWS.

5. **Schedule and Monitor the Workflow**
- Create an Airflow DAG to schedule the ETL process.
- Use Airflow operators and sensors to automate and monitor each step.

## Installation and Setup

```bash
# Install Apache Airflow (using pip)
pip install apache-airflow

# Initialize the Airflow database
airflow db init

# Run the dag file
python weather_dag.py