Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lawal-hash/olistelt
An end-to-end ELT data pipeline of the Brazilian olist e-commerce dataset using the modern data stack
https://github.com/lawal-hash/olistelt
airflow bigquery dbt dbt-core docker postgresql sql
Last synced: about 1 month ago
JSON representation
An end-to-end ELT data pipeline of the Brazilian olist e-commerce dataset using the modern data stack
- Host: GitHub
- URL: https://github.com/lawal-hash/olistelt
- Owner: lawal-hash
- Created: 2024-08-14T08:36:04.000Z (3 months ago)
- Default Branch: dev
- Last Pushed: 2024-08-28T07:21:23.000Z (3 months ago)
- Last Synced: 2024-09-29T06:07:05.022Z (about 2 months ago)
- Topics: airflow, bigquery, dbt, dbt-core, docker, postgresql, sql
- Language: Python
- Homepage: https://lawal-hash-olistelt-dashboardapp-dev-7arv8r.streamlit.app/
- Size: 17.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# OlistELT Pipeline
![pipeline](images/final_diagram.png)
## Demo
## Screenshots
![pipeline](images/airflow_graph.png)![pipeline](images/dashboard.png)
## Environment Variables
To run this project, you will need two API keys and two .env files (postgresql and airflow)
`KAGGLE API KEY`
`GCP SERVICE ACCOUNT KEY`
| Postgres env | Airflow env|
| -------- | -------- |
| `POSTGRES_USER` | `_AIRFLOW_WWW_USER_USERNAME` |
| `POSTGRES_PASSWORD` | `_AIRFLOW_WWW_USER_PASSWORD` |
| `POSTGRES_DB` ||```bash
postgre_docker_init/
├── ingestion/
│ └── kaggle.json
| └── .env
├── airflow/
| └── .env
│ └── config/
| └── service-account.json
├── dbt/
| └── service-account.json
|
└── dashboard/
└── .streamlit/
└── secrets.toml```
## Run LocallyClone the project
```bash
git clone https://github.com/lawal-hash/OlistELT.git
```Go to the project directory
```bash
cd OlistELT
```Install dependencies
```bash
pip install -r requirements.txt
```Start the project in the following order
| Folder | Command one| Command two|
| -------- | -------- | --------|
| ingestion | `cd ingestion` | `docker compose up` |
| airflow | `cd airflow` | `docker compose up`|Add the following connection ids from with Airflow UI. Go to Admin -> Connection then add the following
`Postgres`
* Connection Id = `postgres_conn_id`
* Connection Type = `Postgres`
* Host = `xxx`
* Schema = `xxx`
* Login = `xxx`
* Password = `xxx`
* Port = `xxx``Google Cloud Platform`
* Connection Id = `gcp_conn_id`
* Connection Type = `Google Cloud`
* Project ID = `xxx`
* Keyfile Path = `/opt/airflow/config/service-account.json`values `xxx` are the same as the one in Postgres .env file.
Change the Dag state from *paused* to *active*. Ensure the Dag status is sucessful before proceeding to the next step.
| Folder | Command one| Command two|
| -------- | -------- | --------|
| dbt |`cd dbt/olist_dbt` | `chmod +x run.sh && ./run.sh` |Start the dashboard server using the command below
| Folder | Command one| Command two|
--------|--------|--------|
|dashboard |`cd dashboard` | `streamlit run app.py`|## Authors
- [@lawal-hash](https://github.com/lawal-hash)