https://github.com/chanupadeshan/basic-data-analytics
basic data analysis project using sql
https://github.com/chanupadeshan/basic-data-analytics
data-analysis eda exploratory-data-analysis sql window-function
Last synced: 1 day ago
JSON representation
basic data analysis project using sql
- Host: GitHub
- URL: https://github.com/chanupadeshan/basic-data-analytics
- Owner: chanupadeshan
- Created: 2025-10-05T18:51:00.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-10-23T10:44:10.000Z (9 months ago)
- Last Synced: 2025-10-23T12:31:47.244Z (9 months ago)
- Topics: data-analysis, eda, exploratory-data-analysis, sql, window-function
- Language: TSQL
- Homepage:
- Size: 792 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# basic-data-analytics
This repository contains simple example SQL scripts and CSV datasets to explore a small data warehouse-style project using SQL Server (mssql).
Contents
- `datasets/` - sample CSV files (gold.dim_customers.csv, gold.dim_products.csv, gold.fact_sales.csv)
- `scripts/` - useful SQL scripts:
- `initialize_database.sql` — creates the `DataWarehouseAnalytics` database, `gold` schema, tables, and bulk-loads data from CSV files (this will DROP the database if it exists)
- `database_exploration.sql` — lists tables and columns
- `date_exploration.sql` — date range and age queries
- `dimentsions_exploration.sql` — (typo: "dimentions") explore dimension tables
- `magnitute_analysis.sql` — aggregation and revenue queries
- `measures_exploration.sql` — measures and summary report
- `ranking_analysis.sql` — top/bottom ranking queries
## EDA primer — Dimensions, Measures and common explorations
🧩 **Dimensions vs. Measures**
EDA starts by distinguishing between two types of data fields:
- **Dimensions** – Qualitative or categorical fields that describe data.
Examples: Category, Product, Region, Gender, Customer_ID
- **Measures** – Quantitative or numerical fields that can be aggregated.
Examples: Sales, Quantity, Profit, Age
➡️ Rule of thumb: If a column is numeric and makes sense to sum or average, it’s a measure. Otherwise, it’s a dimension.
📏 **Dimensions Exploration**
Dimension exploration helps identify unique values and distribution across categorical fields.
Common SQL operations:
- `SELECT DISTINCT column_name`
- `COUNT(*)` / `COUNT(DISTINCT column_name)`
- `GROUP BY column_name` with `COUNT()`, `AVG()`, `SUM()`
- Percentiles / frequency distributions (e.g., COUNT() / total)
📆 **Date Exploration**
Dates are crucial for understanding time trends and activity periods.
Common SQL operations:
- MIN(order_date), MAX(order_date) — find dataset span
- DATEDIFF / DATEPART — compute durations or extract year/month/week
- GROUP BY YEAR(order_date), MONTH(order_date) — time series aggregations
- Rolling/window functions for moving averages (e.g., OVER(ORDER BY order_date ROWS BETWEEN ...))
📈 **Measures Exploration**
Measure exploration summarizes numerical columns using aggregations.
Common SQL operations:
- SUM(sales_amount), AVG(price), MIN(), MAX()
- COUNT(DISTINCT order_number) — orders vs items
- Distribution checks (histograms, percentiles) using NTILE or PERCENTILE_CONT
📊 **Magnitude Analysis**
Magnitude analysis connects measures with dimensions to show how metrics vary across categories.
Common SQL operations:
- GROUP BY category -> SUM(sales_amount) to see category revenue
- ORDER BY SUM(sales_amount) DESC to find largest contributors
- JOIN fact -> dimension tables to attribute measures to descriptive fields
🏆 **Ranking Analysis**
Ranking is used to find top or bottom performers in a dataset.
Common SQL operations:
- ROW_NUMBER(), RANK(), DENSE_RANK() with ORDER BY SUM(...) DESC/ASC
- Use a CTE/window function then filter by rank (e.g., WHERE rank <= 10)
## Part 2: Advanced Data Analytics
For more advanced SQL warehouse exploration, see [advanced-data-analytics](https://github.com/chanupadeshan/advanced-data-analytics).