https://github.com/dde-labs/zrch-data-pipeline
ZRCH Assignment for create Data Pipeline
https://github.com/dde-labs/zrch-data-pipeline
airflow assignment-problem duckdb py39
Last synced: 2 months ago
JSON representation
ZRCH Assignment for create Data Pipeline
- Host: GitHub
- URL: https://github.com/dde-labs/zrch-data-pipeline
- Owner: dde-labs
- License: mit
- Created: 2024-07-05T09:20:19.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-07-08T14:01:12.000Z (11 months ago)
- Last Synced: 2024-07-09T16:35:42.649Z (11 months ago)
- Topics: airflow, assignment-problem, duckdb, py39
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ZRCH: Data Pipeline
**ZRCH** Assignment for create Data Pipeline.
> You are tasked with designing a data pipeline for a fictional e-commerce company,
> "ShopSmart". The company wants to analyze customer behavior, sales performance,
> and inventory management. You need to create a solution that ingests, processes,
> and stores data from various sources.
>
> **Questions**:
> - What are the top 2 best-selling products?
> - What is the average order value per customer?
> - What is the total revenue generated per product category?> [!WARNING]
> This project does not implement testcases because I do not have time to learn.## Architecture
```text
LocalFile --> MySQL --> Log
```> [!NOTE]
> This it the first assignment that want to create end-to-end data pipeline from
> source to serve analytic reports. So, I will try to orchestrate with **Airflow**
> in this assignment.## Getting Started
First, you should start with create your Python venv and install `uv`.
```shell
python -m venv venv
./venv/Scripts/activate
(venv) python -m pip install -U pip
(venv) pip install uv
```Then, setup the connection file on `./secrets/connections.yaml`
```yaml
file_local:
conn_type: file
description: Local Data on Docker
extra_dejson:
path: "/opt/airflow/data"warehouse:
conn_type: mysql
description: Local MySQL on Docker
host: mysql
schema: ...
login: ...
password: ...
port: 3306
```Next, start provisioning all services by Docker Compose;
```shell
docker compose -f ./.container/docker-compose.warehouse.yml --env-file .env up -d
docker compose -f ./.container/docker-compose.yml --env-file .env up -d
```> [!NOTE]
> Down Docker Compose;
> ```shell
> docker compose -f ./.container/docker-compose.yml --env-file .env down -v
> docker compose -f ./.container/docker-compose.warehouse.yml --env-file .env down
> ```Login to Airflow `localhost:8080` with root user and start observe the dags.