Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kevinknights29/airflow_retail_pipeline
This project is inspired in the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, dbt Soda, and more!
https://github.com/kevinknights29/airflow_retail_pipeline
airflow data-pipeline python
Last synced: 23 days ago
JSON representation
This project is inspired in the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, dbt Soda, and more!
- Host: GitHub
- URL: https://github.com/kevinknights29/airflow_retail_pipeline
- Owner: kevinknights29
- License: mit
- Created: 2023-08-12T00:36:23.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-09T05:01:27.000Z (9 months ago)
- Last Synced: 2024-05-08T00:23:57.482Z (6 months ago)
- Topics: airflow, data-pipeline, python
- Language: Python
- Homepage:
- Size: 7.01 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Airflow_Retail_Pipeline
This project is inspired by the video: Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, Dbt, Soda, and more!
## Prerequisites
- [ ] Have Docker installed
To install check: [Docker Dekstop Install](https://www.docker.com/products/docker-desktop/)
- [ ] Have Astro CLI installed
If you use brew, you can run: `brew install astro`
For other systems, please refer to: [Install Astro CLI](https://docs.astronomer.io/astro/cli/install-cli)
- [ ] Have a Soda account
You can get a 45-day free trial: [Soda](https://www.soda.io/)
- [ ] Have a Google Cloud account
You can create your account here: [Google Cloud](cloud.google.com)
## Getting Started
1. Run `astro dev init` to create the necessary files for your environment.
2. Run `astro dev start` to start the airflow service with docker.
3. Download dataset from [Kaggle - Online Retail](https://www.kaggle.com/datasets/tunguz/online-retail?resource=download)
- Create a folder `dataset` inside the `include` directory and add your CSV file there.
4. Create a Google Cloud Bucket.
- Create a folder called `input`
5. Create a Service Account.
- Grant access to Cloud Storage as "Storage Admin".
- Grant access to BigQuery as "BigQuery Admin".
6. Create a JSON key for the Service Account.
- Create a folder `gcp` inside the `include` directory and add your JSON key there.
7. Create a connection in the Airflow UI using the path of the JSON key.