https://github.com/rubnsbarbosa/airflow-data-pipeline
Airflow DAG pipeline which extract data from s3 and load into Redshift
https://github.com/rubnsbarbosa/airflow-data-pipeline
apache-airflow dag pipeline redshift s3-bucket
Last synced: 2 months ago
JSON representation
Airflow DAG pipeline which extract data from s3 and load into Redshift
- Host: GitHub
- URL: https://github.com/rubnsbarbosa/airflow-data-pipeline
- Owner: rubnsbarbosa
- License: mit
- Created: 2024-06-16T18:24:15.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-03-03T00:23:25.000Z (3 months ago)
- Last Synced: 2025-03-20T17:59:26.593Z (2 months ago)
- Topics: apache-airflow, dag, pipeline, redshift, s3-bucket
- Language: Python
- Homepage:
- Size: 16.6 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Airflow Data Pipeline: S3 to Redshift
This project demonstrates an end-to-end data pipeline using Apache Airflow, where data is
extracted from an S3 bucket, processed with Pandas, and then loaded into Amazon Redshift.### Introduction
This repository provides an example of a data pipeline that:
1. Extracts a CSV file from an S3 bucket.
2. Loads the CSV content into a Pandas DataFrame.
3. Inserts the data into a table in Amazon Redshift.### Prerequisites
- Python 3.8+
- Apache Airflow 2.0+
- LocalStack (for local development)#### Virtual Environment
It's recommended to use a virtual environment for dependency management.
```bash
python3 -m venv .venv
.venv/bin/activate
```### Airflow DAG
The Airflow DAG is responsible for orchestrating the data pipeline. Here’s a simplified example:
```python
@task
def run_data_pipeline():
try:
data = extract_data_from_s3()
load_data_into_redshift(data)except Exception as e:
raise AirflowException(f"Airflow raised the exception: {e}")(init >> run_data_pipeline() >> done)
```### Usage
1. Activate the Virtual Environment.
2. Set Environment Variables.
3. Run Airflow (Start Airflow webserver).
4. Trigger the DAG on the Airflow UI.### License
This project is licensed under the MIT License.