https://github.com/malbiruk/salesflow-data-pipeline
End-to-end data engineering pipeline using Azure Blob, Data Factory, dbt, Snowflake, and Streamlit for interactive business analytics. (WIP)
https://github.com/malbiruk/salesflow-data-pipeline
azure-data-factory cloud-data-engineering data-visualization dbt etl snowflake streamlit
Last synced: 16 days ago
JSON representation
End-to-end data engineering pipeline using Azure Blob, Data Factory, dbt, Snowflake, and Streamlit for interactive business analytics. (WIP)
- Host: GitHub
- URL: https://github.com/malbiruk/salesflow-data-pipeline
- Owner: malbiruk
- Created: 2025-04-08T04:47:27.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-05T06:29:18.000Z (about 1 year ago)
- Last Synced: 2025-05-05T07:32:59.236Z (about 1 year ago)
- Topics: azure-data-factory, cloud-data-engineering, data-visualization, dbt, etl, snowflake, streamlit
- Language: Python
- Homepage:
- Size: 238 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SalesFlow: Azure-to-Snowflake Data Pipeline with Interactive Dashboard
This project demonstrates a modern data engineering workflow using Azure services, Snowflake, and a Streamlit dashboard.
It simulates a full pipeline from raw CSV to business-ready analytics ā a practical showcase of my hands-on work with cloud-based data tools.
---
## šļø Stack Overview
- **Azure Blob Storage** ā stores raw CSV data
- **Azure Data Factory** ā loads data from Blob to Snowflake
- **Snowflake** ā cloud data warehouse for structured data
- **dbt** ā transforms initial table from raw schema to normalized schema tables, and normalized to analytics schema tables
- **Streamlit + Plotly** ā interactive dashboard for visual insights
- **Python** ā data transformation, scripting, dashboard backend
---
## š Pipeline Flow
1. Upload [raw CSV of sales data](https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/) to Azure Blob Storage
2. Use Azure Data Factory to:
- Extract the data
- Apply basic cleaning/transformation
- Load into Snowflake tables
3. Query Snowflake with Python
4. Visualize data in a Streamlit dashboard
---
## š Schema
**normalized**
[](https://liambx.com/erd/p/github.com/malbiruk/salesflow-data-pipeline/blob/main/db_schema/normalized_schema.sql?showMode=ALL_FIELDS)
---
## š Streamlit Dashboard Features (Planned)
**Filters:**
- Date range
- State
- Product Category
**Charts:**
- š Revenue trend over time (line chart)
- š§āš¤āš§ Top 5 customers by spend (bar chart)
- š Revenue by state (choropleth map)
**Metrics:**
- Total revenue
- Orders count
- Avg. order value
---
## š§ Status
- [x] Repo initialized
- [x] Sample data upload
- [x] Snowflake schema
- [x] ADF pipeline setup
- [ ] dbt transformations
- [ ] Dashboard MVP
---
## ⨠Why Iām Building This
Iām transitioning from a bioinformatics background into cloud data engineering. This project helps me deepen my skills in data pipelines and cloud analytics ā while showcasing tools used in production-level DE workflows.