An open API service indexing awesome lists of open source software.

https://github.com/malbiruk/salesflow-data-pipeline

End-to-end data engineering pipeline using Azure Blob, Data Factory, dbt, Snowflake, and Streamlit for interactive business analytics. (WIP)
https://github.com/malbiruk/salesflow-data-pipeline

azure-data-factory cloud-data-engineering data-visualization dbt etl snowflake streamlit

Last synced: 16 days ago
JSON representation

End-to-end data engineering pipeline using Azure Blob, Data Factory, dbt, Snowflake, and Streamlit for interactive business analytics. (WIP)

Awesome Lists containing this project

README

          

# SalesFlow: Azure-to-Snowflake Data Pipeline with Interactive Dashboard

This project demonstrates a modern data engineering workflow using Azure services, Snowflake, and a Streamlit dashboard.

It simulates a full pipeline from raw CSV to business-ready analytics — a practical showcase of my hands-on work with cloud-based data tools.

---

## šŸ—‚ļø Stack Overview

- **Azure Blob Storage** — stores raw CSV data
- **Azure Data Factory** — loads data from Blob to Snowflake
- **Snowflake** — cloud data warehouse for structured data
- **dbt** — transforms initial table from raw schema to normalized schema tables, and normalized to analytics schema tables
- **Streamlit + Plotly** — interactive dashboard for visual insights
- **Python** — data transformation, scripting, dashboard backend

---

## šŸ”„ Pipeline Flow

1. Upload [raw CSV of sales data](https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/) to Azure Blob Storage
2. Use Azure Data Factory to:
- Extract the data
- Apply basic cleaning/transformation
- Load into Snowflake tables
3. Query Snowflake with Python
4. Visualize data in a Streamlit dashboard

---

## šŸ“ Schema

**normalized**
[![ERD schema](db_schema/ERD.png)](https://liambx.com/erd/p/github.com/malbiruk/salesflow-data-pipeline/blob/main/db_schema/normalized_schema.sql?showMode=ALL_FIELDS)

---

## šŸ“Š Streamlit Dashboard Features (Planned)

**Filters:**
- Date range
- State
- Product Category

**Charts:**
- šŸ“ˆ Revenue trend over time (line chart)
- šŸ§‘ā€šŸ¤ā€šŸ§‘ Top 5 customers by spend (bar chart)
- šŸŒ Revenue by state (choropleth map)

**Metrics:**
- Total revenue
- Orders count
- Avg. order value

---

## 🚧 Status

- [x] Repo initialized
- [x] Sample data upload
- [x] Snowflake schema
- [x] ADF pipeline setup
- [ ] dbt transformations
- [ ] Dashboard MVP

---

## ✨ Why I’m Building This

I’m transitioning from a bioinformatics background into cloud data engineering. This project helps me deepen my skills in data pipelines and cloud analytics — while showcasing tools used in production-level DE workflows.