{"id":24407641,"url":"https://github.com/lock747/sales-database-analysis-sql-project","last_synced_at":"2025-03-13T08:25:00.109Z","repository":{"id":271503716,"uuid":"913670389","full_name":"Lock747/Sales-Database-Analysis-SQL-Project","owner":"Lock747","description":"This repository contains a comprehensive SQL-based sales database designed for analytics, reporting, and business insights.   The project involves setting up a retail sales database, performing exploratory data analysis (EDA), and answering specific business questions through SQL queries.","archived":false,"fork":false,"pushed_at":"2025-01-08T10:47:48.000Z","size":121,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-20T05:17:42.392Z","etag":null,"topics":["sql-server"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Lock747.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-08T06:10:35.000Z","updated_at":"2025-01-12T12:48:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"11393e3b-d43d-4034-9630-351d3ced331f","html_url":"https://github.com/Lock747/Sales-Database-Analysis-SQL-Project","commit_stats":null,"previous_names":["lock747/sales-database-analysis-sql-project"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FSales-Database-Analysis-SQL-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FSales-Database-Analysis-SQL-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FSales-Database-Analysis-SQL-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FSales-Database-Analysis-SQL-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Lock747","download_url":"https://codeload.github.com/Lock747/Sales-Database-Analysis-SQL-Project/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243367299,"owners_count":20279521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sql-server"],"created_at":"2025-01-20T05:17:36.393Z","updated_at":"2025-03-13T08:25:00.029Z","avatar_url":"https://github.com/Lock747.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sales Database Analysis SQL Project\n\n## Project Overview\n\n**Project Title**: Retail Sales Analysis  \n\nThe project involves setting up a retail sales database, performing exploratory data analysis (EDA), and answering specific business questions through SQL queries.\n\n## Objectives\n\n1. **Schema**: Includes normalized tables for Customers, Products, Orders, Sales.\n2. **Data: Sample** sales data representing transactions, customer and product details.\n3. **Exploratory Data Analysis (EDA)**: Perform basic exploratory data analysis to understand the dataset.\n4. **Queries**: Predefined SQL queries for analyzing sales performance, explore, clean, analyze retail sales data trends, and customer behavior.\n\n## Project Structure\n\n### 1. Database Setup\n\n- **Database Creation**: The project starts by creating a database named `p1_retail_db`.\n- **Table Creation**: A table named `sales_pr` is created to store the sales data. The table structure includes columns for transaction ID, sale date, sale time, customer ID, gender, age, product category, quantity sold, price per unit, cost of goods sold (COGS), and total sale amount.\n\n```sql\nCREATE DATABASE p1_retail_db;\n\nCREATE TABLE sales_pr\n(\n    transactions_id INT PRIMARY KEY,\n    sale_date DATE,\t\n    sale_time TIME,\n    customer_id INT,\t\n    gender VARCHAR(10),\n    age INT,\n    category VARCHAR(35),\n    quantity INT,\n    price_per_unit FLOAT,\t\n    cogs FLOAT,\n    total_sale FLOAT\n);\n```\n\n### 2. Data Exploration \u0026 Cleaning\n\n- **Record Count**: Determine the total number of records in the dataset.\n- **Customer Count**: Find out how many unique customers are in the dataset.\n- **Category Count**: Identify all unique product categories in the dataset.\n- **Null Value Check**: Check for any null values in the dataset and delete records with missing data.\n\n```sql\nSELECT COUNT(*) FROM sales_pr;\nSELECT COUNT(DISTINCT customer_id) FROM retail_sales;\nSELECT DISTINCT category FROM retail_sales;\n\n\nSELECT * FROM sales_pr\nWHERE \n    sale_date IS NULL OR sale_time IS NULL OR customer_id IS NULL OR \n    gender IS NULL OR age IS NULL OR category IS NULL OR \n    quantity IS NULL OR price_per_unit IS NULL OR cogs IS NULL;\n\nDELETE FROM sales_pr\nWHERE \n    sale_date IS NULL OR sale_time IS NULL OR customer_id IS NULL OR \n    gender IS NULL OR age IS NULL OR category IS NULL OR \n    quantity IS NULL OR price_per_unit IS NULL OR cogs IS NULL;\n```\n\n### 3. Data Analysis \u0026 Findings\n\nThe following SQL queries were developed to answer specific business questions:\n\n1. **Write a SQL query to understand Age group and respective purchasing power and profit.**:\n```sql\nwith age_class as (\n\tselect *,\n\tcase \n\t\twhen age \u003c25 then 'Young'\n\t\twhen age between 25 and 40 then 'Middle Age'\n\t\telse 'Senior'\n\tend as age_group\n\tfrom sales_pr\n)\n\nselect \n\tage_group, \n\tsum(total_sale)[Spending_Power],\n\tsum((Total_sale-quantity*cogs))[Total_Profit]\nfrom age_class\ngroup by age_group\n\n\n\nOUTPUT\n\nAge_group\tSpending_Power\tTotal_Profit\n---------\t--------------\t-------------\nSenior\t         ₹450,745.00 \t ₹210,447.40 \nMiddle Age\t ₹308,325.00 \t ₹137,062.30 \nYoung\t         ₹149,160.00 \t ₹73,859.25\n\n```\n\n2. **Write a SQL query to understand all  Gender wise purchasing power.**:\n```sql\nselect \ngender,\nsum(total_sale)[Spending_Power],\nsum((Total_sale-quantity*cogs))[Total_Profit]\nfrom sales_pr\ngroup by gender;\n\nOUTPUT\n\nGender\tSpending_Power\tTotal_Profit\nMale\t₹445,120.00\t₹223,965.45\nFemale\t₹463,110.00\t₹197,403.50\n```\n\n3. **Write a SQL query to calculate the total sales (total_sale) for each category.**:\n```sql\nSELECT \n    category,\n    SUM(total_sale) as net_sale,\n    COUNT(*) as total_orders\nFROM sales_pr\ngroup by category\n\nOUTPUT\n\ncategory\tnet_sale\ttotal_orders\n---------       --------\t------------\nClothing\t₹309,995.00\t698\nElectronics\t₹311,445.00\t678\nBeauty\t        ₹286,790.00\t611\n```\n\n4. **Write a SQL query to find the peak season in 2 years and the most profitable season.**:\n```sql\nwith sale_season as (\n\tselect *,\n\tcase\n\t\twhen month(sale_date) between 7 and 10 then 'Monsoon'\n\t\twhen month(sale_date) between 3 and 6 then 'Summer'\n\t\telse 'Winter'\n\tend as Season\t\n\tfrom sales_pr\n\t)\n\nselect \n\tSeason, category,\n\tsum(total_sale)[Total Sale],\n\tsum((Total_sale-quantity*cogs))[Profit]\nfrom sale_season\ngroup by category, Season\norder by Season, sum(total_sale) desc\n\nOUTPUT\n\nSeason\tcategory\t Total Sale\t Profit\n-----    ------           ---------\t--------\nMonsoon\tElectronics\t ₹133,385.00 \t ₹52,533.10 \nMonsoon\tClothing\t ₹116,160.00 \t ₹49,637.80 \nMonsoon\tBeauty\t         ₹112,715.00 \t ₹47,397.40 \nSummer\tBeauty\t         ₹69,470.00 \t ₹42,057.80 \nSummer\tClothing\t ₹68,925.00 \t ₹41,508.85 \nSummer\tElectronics\t ₹52,795.00 \t ₹30,155.30 \nWinter\tElectronics\t ₹125,265.00 \t ₹58,002.45 \nWinter\tClothing\t ₹124,910.00 \t ₹52,089.65 \nWinter\tBeauty\t         ₹104,605.00 \t ₹47,986.60\n\n\nwith sale_season as (\n\tselect *,\n\tcase\n\t\twhen month(sale_date) between 7 and 10 then 'Monsoon'\n\t\twhen month(sale_date) between 3 and 6 then 'Summer'\n\t\telse 'Winter'\n\tend as Season\t\n\tfrom sales_pr\n\t)\n\nselect \t\n\tSeason,\n\tsum(total_sale)[Total Sale]\nfrom sale_season\ngroup by Season\norder by sum(total_sale) desc\n\nOUTPUT\n\nSeason\tTotal Sale\n------  ----------\nMonsoon\t₹362,260.00\nWinter\t₹354,780.00\nSummer\t₹191,190.00\n```\n\n5. **Write a SQL query to find the best Category in for an year.**:\n```sql\n\nselect * from (\nselect \n\tyear(sale_date) as [Year],\n\tcategory,\n\tSum(total_sale) [Total_Sale],\n\trank()over (partition by year(sale_date) order by Sum(total_sale) desc) as top_category\nfrom sales_pr\ngroup by year(sale_date), category\n) as l\nwhere top_category = '1'\n\nOUTPUT\n\nYear\tcategory\tTotal_Sale\ttop_category\n----     ------         ----------      ------------\n2022\tBeauty\t         ₹151,460.00 \t  1\n2023\tElectronics\t ₹162,350.00 \t  1\n```\n\n6. **Write a SQL query to find the total number of transactions (transaction_id) made by each gender in each category.**:\n```sql\nselect \n\tgender,\n\tcategory,\n\tcount(transactions_id) as number_of_orders\nfrom sales_pr\ngroup by gender, category\norder by 1\n\nOUTPUT\n\ngender\tcategory\tnumber_of_orders\n-----\t-------\t\t----------------\nFemale\tClothing\t347\nFemale\tBeauty\t        330\nFemale\tElectronics\t335\nMale\tBeauty\t        281\nMale\tElectronics\t343\nMale\tClothing\t351\n\n```\n\n7. **Write a SQL query to calculate the average sale for each month. Find out best selling month in each year**:\n```sql\nSELECT \n       Year,\n       Month,\n    avg_sale\nFROM \n(    \nSELECT \n    YEAR(sale_date) as Year,\n    MONTH(sale_date) as Month,\n    AVG(total_sale) as avg_sale,\n    RANK()OVER(\t\t\n\t\tPARTITION BY YEAR(sale_date) \n\t\tORDER BY AVG(total_sale) DESC) \n\tas rank\nfrom sales_pr\ngroup by YEAR(sale_date),\n    MONTH(sale_date)\n) as t1\nWHERE rank = 1\n\nOUTPUT\n\nYear\tMonth\tavg_sale\n----\t-----\t-------\n2022\t7\t541\n2023\t2\t535\n\n```\n\n8. **Write a SQL query to find the top 5 customers based on the highest total sales **:\n```sql\nselect top 5 * from \n(select customer_id, gender, age, category, quantity, total_sale,cogs,\ndense_rank()over(\n\tpartition by customer_id\n\torder by total_sale desc) as rank\nfrom sales_pr) as ts1\nwhere rank = 1 and total_sale like ( select max(total_sale) from sales_pr)\norder by cogs\n\nOUTPUT\n\ncustomer_id\tgender\tage\tcategory\tquantity\ttotal_sale\tcogs\trank\n-----------\t------\t--- \t--------\t--------\t----------\t----\t-----\n111\t         Male\t53\tElectronics\t4\t         2000\t         125\t1\n134\t         Female\t51\tElectronics\t4\t         2000\t         135\t1\n131\t         Male\t44\tClothing\t4\t         2000\t         140\t1\n148\t         Female\t35\tBeauty\t        4\t         2000\t         140\t1\n71\t         Male\t25\tClothing\t4\t         2000            145\t1\n\n```\n\n9. **Write a SQL query to create each shift and number of orders (Example Morning \u003c12, Afternoon Between 12 \u0026 17, Evening \u003e17)**:\n```sql\n\nwith hourly_sale as (\n\tselect *,\n\tcase\n\t\twhen cast(sale_time as time) \u003c '12:00:00' then 'Morning'\n\t\twhen cast(sale_time as time) \u003e= '12:00:00' and cast(sale_time as time) \u003c '17:00:00' then 'Afternoon'\n\t\telse 'Evening'\n\tend as shift \n\tfrom sales_pr\n\t)\n\nselect shift, count(*)[Orders] from hourly_sale \ngroup by shift\norder by \ncase \n\twhen shift = 'Morning' then 1\n\twhen shift = 'Afternoon' then 2\n\telse 3\nend\n\nOUTPUT\n\nshift\t\tOrders\n-----\t\t-------\nMorning\t\t548\nAfternoon\t164\nEvening\t\t1275\n\n```\n\n## Findings\n\n- **Customer Demographics**: The dataset includes customers from various age groups, with sales distributed across different categories such as Clothing, Electronics and Beauty.\n- **High-Value Transactions**: Women and above the age of 40 spends more on goods \n- **Sales Trends**: Monthly analysis shows variations in sales, helping identify peak seasons.\n- **Customer Insights**: The analysis identifies the top-spending customers and the most popular product categories.\n\n## Reports\n\n- **Sales Summary**: A detailed report summarizing total sales, customer demographics, and category performance.\n- **Trend Analysis**: Insights into sales trends across different months and shifts.\n- **Customer Insights**: Reports on top customers and unique customer counts per category.\n\n## Conclusion\n\nThis project serves as a comprehensive introduction to SQL for data analysts, covering database setup, data cleaning, exploratory data analysis, and business-driven SQL queries. The findings from this project can help drive business decisions by understanding sales patterns, customer behavior, and product performance.\n\n\nThank you!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flock747%2Fsales-database-analysis-sql-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flock747%2Fsales-database-analysis-sql-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flock747%2Fsales-database-analysis-sql-project/lists"}