Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ris-tlp/audiophile-e2e-pipeline
Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
https://github.com/ris-tlp/audiophile-e2e-pipeline
airflow aws data-engineering metabase python terraform
Last synced: 2 months ago
JSON representation
Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
- Host: GitHub
- URL: https://github.com/ris-tlp/audiophile-e2e-pipeline
- Owner: ris-tlp
- Created: 2022-09-30T09:51:57.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-01T12:08:22.000Z (about 2 years ago)
- Last Synced: 2024-08-02T17:38:31.087Z (5 months ago)
- Topics: airflow, aws, data-engineering, metabase, python, terraform
- Language: Python
- Homepage:
- Size: 847 KB
- Stars: 187
- Watchers: 3
- Forks: 36
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Audiophile End-To-End ELT Pipeline
Pipeline that extracts data from [Crinacle's](https://crinacle.com/) Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
## Architecture
![Architecture](https://github.com/ris-tlp/audiophile-e2e-pipeline/blob/main/images/architecture.jpeg)
Infrastructure provisioning through [Terraform](https://www.terraform.io/), containerized through [Docker](https://www.docker.com/) and orchestrated through [Airflow](https://airflow.apache.org/). Created dashboard through [Metabase](https://www.metabase.com/).
DAG Tasks:
1. Scrape data from [Crinacle's](https://crinacle.com/) website to generate bronze data.
2. Load bronze data to [AWS S3](https://aws.amazon.com/s3/).
3. Initial data parsing and validation through [Pydantic](https://github.com/pydantic/pydantic) to generate silver data.
4. Load silver data to [AWS S3](https://aws.amazon.com/s3/).
5. Load silver data to [AWS Redshift](https://aws.amazon.com/redshift/).
6. Load silver data to [AWS RDS](https://aws.amazon.com/rds/) for future projects.
7. and 8. Transform and test data through [dbt](https://docs.getdbt.com/) in the warehouse.## Dashboard
![Dashboard](https://github.com/ris-tlp/audiophile-e2e-pipeline/blob/main/images/metabase_dashboard.jpeg)
## Requirements
1. Configure AWS account through [AWS CLI](https://aws.amazon.com/cli/). [Reqruired for Terraform]
2. [Terraform](https://www.terraform.io/). [Required to provision AWS services]
3. [Docker / Docker-Compose](https://www.docker.com/). [Required to run Airflow DAG / pipeline]## Run Pipeline
1. `make infra`: create AWS services. You will be asked to enter a password for your Redshift and RDS clusters.
2. `make config`: generate configuration with Terraform outputs and AWS credentials.
3. `make base-build`: build base airflow image with project requirements.
4. `make build`: build docker images for airflow.
5. `make up`: run the pipeline.