https://github.com/madhurimarawat/data-warehousing
This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.
https://github.com/madhurimarawat/data-warehousing
data-aggregation data-cleaning data-cleaning-and-preprocessing data-warehousing detailed-documentation etl etl-pipeline mysql normalization olap-cube olap-data olap-database query-optimization snowflake-schema star-schema
Last synced: 7 months ago
JSON representation
This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.
- Host: GitHub
- URL: https://github.com/madhurimarawat/data-warehousing
- Owner: madhurimarawat
- License: mit
- Created: 2025-01-23T09:07:09.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-03-10T07:39:35.000Z (8 months ago)
- Last Synced: 2025-03-10T08:28:52.329Z (8 months ago)
- Topics: data-aggregation, data-cleaning, data-cleaning-and-preprocessing, data-warehousing, detailed-documentation, etl, etl-pipeline, mysql, normalization, olap-cube, olap-data, olap-database, query-optimization, snowflake-schema, star-schema
- Language: Jupyter Notebook
- Homepage:
- Size: 7.21 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data-Warehousing
This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

---
## Tools and Technologies βοΈπ»
1. [MySQL](https://dev.mysql.com/doc/): Β An open-source relational database management system for managing and organizing structured data using SQL.
2. [Python](https://www.python.org/doc/): Β A high-level, interpreted programming language known for its readability and versatility. It supports multiple programming paradigms and is widely used for web development, data analysis, automation, and scientific computing.
3. [Pandas](https://pandas.pydata.org/docs/): Β An open-source data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, enabling efficient handling and analysis of structured data.
4. [NumPy](https://numpy.org/doc/): Β A fundamental package for numerical computing in Python. It offers support for multi-dimensional arrays and matrices, along with a collection of mathematical functions for performing efficient operations on these data structures.
5. [MySQL Connector](https://dev.mysql.com/doc/connector-python/en/): Β A Python library that enables connecting to a MySQL database server. It allows developers to execute SQL queries, manage database connections, and interact with MySQL databases directly from Python applications.
---
## Directory Structure π
```
Data-Warehousing/
β
βββ Experiment 1/
β βββ Documentation/ π
| β βββ Explanation of methods and key observations from Experiment 1.
β
βββ Experiment 2/
β βββ Codes/ π»
β β βββ Contains the MySQL script for input and output in Experiment 2.
β βββ Documentation/ π
β β βββ Detailed documentation explaining the methodology and analysis for Experiment 2.
β βββ Output/ π
β β βββ Contains the results and analysis of Experiment 2.
βββ Experiment 3/
β βββ Codes/ π»
β β βββ Contains the MySQL script for input and output in Experiment 3.
β βββ Documentation/ π
β β βββ Detailed documentation explaining the methodology and analysis for Experiment 3.
β βββ Output/ π
β β βββ Contains the results and analysis of Experiment 3.
.....
```
### **Project Folder Structure**
- **Codes** π» (If applicable)
Contains the source code files used for data processing and analysis in each experiment. These scripts are essential for executing tasks within the experiment. Additionally, the following files are included:
- **MySQL Commands and Output (TXT)**: This text file contains the specific MySQL command-line operations used in the experiment, documenting both the input commands and their corresponding outputs. A detailed explanation of these commands and their results can be found in the **Documentation** folder, available in both **MD** and **PDF** formats.
- **Dataset** π (If applicable)
Stores datasets used in experiments, ensuring easy access and organization.
- e.g., `data.csv`, `stream_data.json`
- **Output** π
Stores results generated from experiments, including visualizations, processed data, logs, and analysis reports. Each experiment's output is stored separately with a relevant name.
- e.g., `Experiment_X_Output` (where "X" refers to the relevant experiment number)
- **Documentation** π
Contains detailed documentation for each experiment, covering methodology, analysis, and insights. Documentation is provided in both Markdown (`.md`) and PDF formats for easy reference.
- `documentation.md` (Markdown version)
- `documentation.pdf` (PDF version, converted from Markdown)
- **Commands File (π)**
A text file stored in the **Codes** folder, documenting specific commands, steps, and MySQL output used in the experiment. This is especially useful for tracking command-line operations and database interactions.
- `MySQL_Commands_Output.txt`
---
## Table Of Contents π π π
### 1. [Introduction to Data Warehousing Concepts](Experiment%201)
This experiment introduces the fundamental concepts and architecture of data warehousing, including ETL processes, data modeling techniques, and OLAP functionalities.
### 2. [Creating Star Schema in Data Warehouse](Experiment%202)
This experiment focuses on designing and implementing a star schema data model for a specified business scenario, emphasizing the creation of fact and dimension tables.
### 3. [Implementing Snowflake Schema in Data Warehouse](Experiment%203)
In this experiment, the Snowflake Schema was implemented to achieve a more
normalized data structure than the Star Schema.
### 4. [Designing ETL Process for Data Warehousing](Experiment%204)
In this experiment, an ETL process was designed and implemented to migrate
data from operational databases to a data warehouse.
### 5. [OLAP Operations in Data Warehousing](Experiment%205)
In this experiment, OLAP operations such as **slicing, dicing, drill-down, drill-up, and pivoting** were applied to analyze predefined data in a data warehouse.
### 6. [Data Cleansing and Transformation](Experiment%206)
This experiment involved **cleaning and transforming raw data** before loading it into the data warehouse, ensuring **consistency, accuracy, and completeness**.
### 7. [Query Optimization in Data Warehousing](Experiment%207)
SQL queries were **optimized for large-scale data warehouse applications** using techniques like **indexing, partitioning, and query tuning** to improve performance.
### 8. [Data Aggregation for Reporting](Experiment%208)
This experiment implemented **data aggregation techniques** to generate **summarized views of large datasets**, enhancing **reporting and analytical efficiency**.
### 9. [Designing and Implementing a Data Warehouse Report](Experiment%209)
This experiment involves generating business reports from a **MySQL data warehouse** using **SQL queries** and **Python** for data extraction and processing.
### 10. [Real-time Data Warehousing using Streaming Data](Experiment%2010)
A **real-time data pipeline** is implemented with **Python**, continuously ingesting streaming data into a **MySQL data warehouse** for immediate analysis.
### 11. [Implementing Slowly Changing Dimensions (SCD) in Data Warehousing](Experiment%2011)
This experiment applies **Slowly Changing Dimensions (SCD)** techniques in a **MySQL data warehouse**, developed using **Python** to maintain historical data accuracy.
---
## Thanks for Visiting π
- Drop a π if you find this repository useful.
- If you have any doubts or suggestions, feel free to reach me.
π« How to reach me: Β [](https://www.linkedin.com/in/madhurima-rawat/) Β Β
- **Contribute and Discuss:** Feel free to open issues π, submit pull requests π οΈ, or start discussions π¬ to help improve this repository!