{"id":27450117,"url":"https://github.com/prakashpandey16/sql_data_warehouse_project","last_synced_at":"2026-05-03T05:44:36.657Z","repository":{"id":287385915,"uuid":"964532776","full_name":"prakashpandey16/sql_data_warehouse_project","owner":"prakashpandey16","description":"Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.","archived":false,"fork":false,"pushed_at":"2025-04-11T12:07:53.000Z","size":10128,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-15T09:11:26.488Z","etag":null,"topics":["cleaning-data","data","data-engineering","data-science","database","etl-pipeline","sqlserver"],"latest_commit_sha":null,"homepage":"","language":"TSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prakashpandey16.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-11T11:11:07.000Z","updated_at":"2025-04-11T12:34:11.000Z","dependencies_parsed_at":"2025-04-11T13:49:05.879Z","dependency_job_id":"5f799fb0-4d1c-4856-adbc-d1c3d68a81c4","html_url":"https://github.com/prakashpandey16/sql_data_warehouse_project","commit_stats":null,"previous_names":["prakashpandey16/sql_data_warehouse_project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/prakashpandey16/sql_data_warehouse_project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakashpandey16%2Fsql_data_warehouse_project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakashpandey16%2Fsql_data_warehouse_project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakashpandey16%2Fsql_data_warehouse_project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakashpandey16%2Fsql_data_warehouse_project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prakashpandey16","download_url":"https://codeload.github.com/prakashpandey16/sql_data_warehouse_project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prakashpandey16%2Fsql_data_warehouse_project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32559716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T03:21:47.309Z","status":"ssl_error","status_checked_at":"2026-05-03T03:21:43.884Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cleaning-data","data","data-engineering","data-science","database","etl-pipeline","sqlserver"],"created_at":"2025-04-15T09:11:24.927Z","updated_at":"2026-05-03T05:44:36.640Z","avatar_url":"https://github.com/prakashpandey16.png","language":"TSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Warehouse and Analytics Project\n\nWelcome to the **Data Warehouse and Analytics Project** repository! 🚀  \nThis project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.\n\n---\n\n## 🏗️ Data Architecture\n\nThe data architecture for this project follows Medallion Architecture **Bronze**, **Silver**, and **Gold** layers:  \n![High Level Architecture](docs/High_level_architecture.png)\n\n1. **Bronze Layer**: Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.  \n2. **Silver Layer**: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.  \n3. **Gold Layer**: Houses business-ready data modeled into a star schema required for reporting and analytics.\n\n---\n\n## 📖 Project Overview\n\nThis project involves:\n\n1. **Data Architecture**: Designing a Modern Data Warehouse Using Medallion Architecture **Bronze**, **Silver**, and **Gold** layers.\n2. **ETL Pipelines**: Extracting, transforming, and loading data from source systems into the warehouse.\n3. **Data Modeling**: Developing fact and dimension tables optimized for analytical queries.\n4. **Analytics \u0026 Reporting**: Creating SQL-based reports and dashboards for actionable insights.\n\n---\n\n## 🛠️ Important Links \u0026 Tools:\n\nEverything is for Free!\n- **[Datasets](datasets/):** Access to the project dataset (CSV files).\n- **[SQL Server Express](https://www.microsoft.com/en-us/sql-server/sql-server-downloads):** Lightweight server for hosting your SQL database.\n- **[SQL Server Management Studio (SSMS)](https://learn.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver16):** GUI for managing and interacting with databases.\n- **[Git Repository](https://github.com/):** Set up a GitHub account and repository to manage, version, and collaborate on your code efficiently.\n- **[DrawIO](https://www.drawio.com/):** Design data architecture, models, flows, and diagrams.\n- **[Notion Template](https://www.notion.com/templates/sql-data-warehouse-project):** Get a customizable project planning template.\n- **[Notion Project Workspace](https://www.notion.so/SQL-Data-Warehouse-Project-1cc8955f9e4380928e7adb64f38d3c85):** Full access to project tasks, progress tracking, and documentation in Notion.\n\n---\n\n## 🚀 Project Requirements\n\n### Building the Data Warehouse (Data Engineering)\n\n#### Objective  \nDevelop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.\n\n#### Specifications\n- **Data Sources**: Import data from two source systems (ERP and CRM) provided as CSV files.\n- **Data Quality**: Cleanse and resolve data quality issues prior to analysis.\n- **Integration**: Combine both sources into a single, user-friendly data model designed for analytical queries.\n- **Scope**: Focus on the latest dataset only; historization of data is not required.\n- **Documentation**: Provide clear documentation of the data model to support both business stakeholders and analytics teams.\n\n---\n\n### BI: Analytics \u0026 Reporting (Data Analysis)\n\n#### Objective  \nDevelop SQL-based analytics to deliver detailed insights into:\n- **Customer Behavior**\n- **Product Performance**\n- **Sales Trends**\n\nThese insights empower stakeholders with key business metrics, enabling strategic decision-making.\n\n---\n\n## 📂 Repository Structure\n\n```plaintext\ndata-warehouse-project/\n│\n├── datasets/                           # Raw datasets used for the project (ERP and CRM data)\n│\n├── docs/                               # Project documentation and architecture visuals\n│   ├── Data_Flow.png                   # Visual representation of data flow\n│   ├── Data_integrations.png           # Diagram of different data integrations\n│   ├── Data_mart.png                   # Schema or design of data marts\n│   ├── ETL_modal.png                   # Visual explanation of ETL processes\n│   ├── High_level_architecture.png     # Overview of the system architecture\n│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata\n│   ├── naming_conventions.md           # Guidelines for consistent naming of tables, columns, and files\n│\n├── scripts/                            # SQL scripts for ETL and transformations\n│   ├── bronze/                         # Scripts for extracting and loading raw data\n│   ├── silver/                         # Scripts for cleaning and transforming data\n│   ├── gold/                           # Scripts for creating analytical models\n│\n├── tests/                              # Test scripts and data quality validation\n│\n├── README.md                           # Project overview, setup instructions, and usage guide\n├── LICENSE                             # License information for the repository\n├── .gitignore                          # Files and directories to be ignored by Git\n```\n\n---\n\n## 🛡️ License\n\nThis project is licensed under the [MIT License](LICENSE). You are free to use, modify, and share this project with proper attribution.\n\n## 🌟 About Me\n\nI'm Prakash Pandey, a BCA student who is passionate about Data Engineering, building real-world data solutions, and solving business problems with technology.\n\n- 🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/prakash-pandey-884590263/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprakashpandey16%2Fsql_data_warehouse_project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprakashpandey16%2Fsql_data_warehouse_project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprakashpandey16%2Fsql_data_warehouse_project/lists"}