Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/undisputed-jay/weather-data-etl-pipeline-using-apache-airflow
https://github.com/undisputed-jay/weather-data-etl-pipeline-using-apache-airflow
Last synced: about 14 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/undisputed-jay/weather-data-etl-pipeline-using-apache-airflow
- Owner: Undisputed-jay
- Created: 2024-10-21T02:40:16.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-10-21T02:51:25.000Z (4 months ago)
- Last Synced: 2024-12-20T07:15:17.434Z (about 2 months ago)
- Language: Python
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Weather Data ETL Pipeline Using Apache Airflow
Project Description
This project implements a Weather Data ETL (Extract, Transform, Load) pipeline using Apache Airflow.
The pipeline is designed to fetch weather data from a public API, transform it into a structured format, and load it into a PostgreSQL database for further analysis.
The solution leverages Airflow's powerful task scheduling and orchestration capabilities to ensure the smooth flow of data between external APIs and a persistent data store.
The project follows a standard ETL process:
-
Extract: The pipeline sends a request to a weather API using an HTTP hook, fetching current weather data based on specific geographic coordinates (latitude and longitude). -
Transform: The pipeline processes the raw weather data, extracting relevant fields such as temperature, wind speed, wind direction, and weather code. -
Load: The transformed data is then loaded into a PostgreSQL database, creating theweather_data
table if it doesn't exist and inserting the current weather information.
The pipeline is simple, efficient, and designed to be flexible for expansion or adaptation to different data sources or weather endpoints.
Features:
- Airflow DAG for scheduling and orchestrating the ETL process.
- HTTP Hook for making requests to an external weather API.
- PostgreSQL Hook for database interaction and data storage.
- Well-structured and modular design, ensuring easy maintenance and scalability.
- Automatic table creation if it doesn't exist already.
- Designed with a focus on reliability and error handling in API data extraction.
Requirements:
- Apache Airflow
- PostgreSQL Database
- Requests library (for handling HTTP requests)
Usage:
To use this pipeline, clone the repository and follow these steps:
- Set up your Airflow environment and install the required providers (
HttpHook
,PostgresHook
). - Update the necessary connection configurations for both the API (
API_CONN_ID
) and the PostgreSQL database (POSTGRES_CONN_ID
) in Airflow. - Add the geographic coordinates (
LATITUDE
,LONGITUDE
) for which you want to retrieve weather data. - Trigger the DAG to fetch, transform, and store weather data into your PostgreSQL instance.
Sample Table Schema:
The weather data is stored in a table with the following schema:
CREATE TABLE IF NOT EXISTS weather_data (
latitude FLOAT,
longitude FLOAT,
temperature FLOAT,
windspeed FLOAT,
winddirection FLOAT,
weathercode INT,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);