{"id":51403659,"url":"https://github.com/chanupadeshan/basic-data-analytics","last_synced_at":"2026-07-04T09:04:55.519Z","repository":{"id":320366625,"uuid":"1070352343","full_name":"chanupadeshan/basic-data-analytics","owner":"chanupadeshan","description":"basic data analysis project using sql","archived":false,"fork":false,"pushed_at":"2025-10-23T10:44:10.000Z","size":811,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-23T12:31:47.244Z","etag":null,"topics":["data-analysis","eda","exploratory-data-analysis","sql","window-function"],"latest_commit_sha":null,"homepage":"","language":"TSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chanupadeshan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-05T18:51:00.000Z","updated_at":"2025-10-23T10:44:13.000Z","dependencies_parsed_at":"2025-10-23T12:31:49.865Z","dependency_job_id":null,"html_url":"https://github.com/chanupadeshan/basic-data-analytics","commit_stats":null,"previous_names":["chanupadeshan/basic-data-analytics"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/chanupadeshan/basic-data-analytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanupadeshan%2Fbasic-data-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanupadeshan%2Fbasic-data-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanupadeshan%2Fbasic-data-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanupadeshan%2Fbasic-data-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chanupadeshan","download_url":"https://codeload.github.com/chanupadeshan/basic-data-analytics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanupadeshan%2Fbasic-data-analytics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35115770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","eda","exploratory-data-analysis","sql","window-function"],"created_at":"2026-07-04T09:04:54.737Z","updated_at":"2026-07-04T09:04:55.512Z","avatar_url":"https://github.com/chanupadeshan.png","language":"TSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# basic-data-analytics\n\nThis repository contains simple example SQL scripts and CSV datasets to explore a small data warehouse-style project using SQL Server (mssql).\n\nContents\n\n- `datasets/` - sample CSV files (gold.dim_customers.csv, gold.dim_products.csv, gold.fact_sales.csv)\n\n- `scripts/` - useful SQL scripts:\n  - `initialize_database.sql` — creates the `DataWarehouseAnalytics` database, `gold` schema, tables, and bulk-loads data from CSV files (this will DROP the database if it exists)\n  - `database_exploration.sql` — lists tables and columns\n  - `date_exploration.sql` — date range and age queries\n  - `dimentsions_exploration.sql` — (typo: \"dimentions\") explore dimension tables\n  - `magnitute_analysis.sql` — aggregation and revenue queries\n  - `measures_exploration.sql` — measures and summary report\n- `ranking_analysis.sql` — top/bottom ranking queries\n\n  ## EDA primer — Dimensions, Measures and common explorations\n\n  🧩 **Dimensions vs. Measures**\n\n  EDA starts by distinguishing between two types of data fields:\n\n  - **Dimensions** – Qualitative or categorical fields that describe data.\n    Examples: Category, Product, Region, Gender, Customer_ID\n\n  - **Measures** – Quantitative or numerical fields that can be aggregated.\n    Examples: Sales, Quantity, Profit, Age\n\n  ➡️ Rule of thumb: If a column is numeric and makes sense to sum or average, it’s a measure. Otherwise, it’s a dimension.\n\n\n  📏 **Dimensions Exploration**\n\n  Dimension exploration helps identify unique values and distribution across categorical fields.\n\n  Common SQL operations:\n  - `SELECT DISTINCT column_name`\n\n  - `COUNT(*)` / `COUNT(DISTINCT column_name)`\n\n  - `GROUP BY column_name` with `COUNT()`, `AVG()`, `SUM()`\n  - Percentiles / frequency distributions (e.g., COUNT() / total)\n\n\n  📆 **Date Exploration**\n\n  Dates are crucial for understanding time trends and activity periods.\n\n  Common SQL operations:\n\n  - MIN(order_date), MAX(order_date) — find dataset span\n  - DATEDIFF / DATEPART — compute durations or extract year/month/week\n  - GROUP BY YEAR(order_date), MONTH(order_date) — time series aggregations\n  - Rolling/window functions for moving averages (e.g., OVER(ORDER BY order_date ROWS BETWEEN ...))\n\n\n  📈 **Measures Exploration**\n\n  Measure exploration summarizes numerical columns using aggregations.\n\n  Common SQL operations:\n\n  - SUM(sales_amount), AVG(price), MIN(), MAX()\n  - COUNT(DISTINCT order_number) — orders vs items\n  - Distribution checks (histograms, percentiles) using NTILE or PERCENTILE_CONT\n\n\n  📊 **Magnitude Analysis**\n\n  Magnitude analysis connects measures with dimensions to show how metrics vary across categories.\n\n  Common SQL operations:\n\n  - GROUP BY category -\u003e SUM(sales_amount) to see category revenue\n  - ORDER BY SUM(sales_amount) DESC to find largest contributors\n  - JOIN fact -\u003e dimension tables to attribute measures to descriptive fields\n\n\n  🏆 **Ranking Analysis**\n\n  Ranking is used to find top or bottom performers in a dataset.\n\n  Common SQL operations:\n\n  - ROW_NUMBER(), RANK(), DENSE_RANK() with ORDER BY SUM(...) DESC/ASC\n  - Use a CTE/window function then filter by rank (e.g., WHERE rank \u003c= 10)\n\n\n## Part 2: Advanced Data Analytics\n\nFor more advanced SQL warehouse exploration, see [advanced-data-analytics](https://github.com/chanupadeshan/advanced-data-analytics).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchanupadeshan%2Fbasic-data-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchanupadeshan%2Fbasic-data-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchanupadeshan%2Fbasic-data-analytics/lists"}