{"id":24157697,"url":"https://github.com/lock747/customer-segmentation-----analysis--sql","last_synced_at":"2025-08-31T12:34:09.704Z","repository":{"id":272141597,"uuid":"915545421","full_name":"Lock747/Customer-Segmentation-----Analysis--SQL","owner":"Lock747","description":"This project demonstrates the use of SQL to perform customer segmentation based on purchasing behavior, demographics, and other relevant data points. Customer segmentation is essential for businesses to personalize marketing, improve customer retention, and optimize product offerings and Power BI to Visualise the Analysis.","archived":false,"fork":false,"pushed_at":"2025-01-13T06:15:15.000Z","size":3105,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-02T01:29:21.472Z","etag":null,"topics":["powerbi","sql-server"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Lock747.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-12T06:06:53.000Z","updated_at":"2025-01-13T06:15:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"5df565f0-dbbb-439a-911d-67a692807093","html_url":"https://github.com/Lock747/Customer-Segmentation-----Analysis--SQL","commit_stats":null,"previous_names":["lock747/customer-segmentation-----analysis--sql"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Lock747/Customer-Segmentation-----Analysis--SQL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FCustomer-Segmentation-----Analysis--SQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FCustomer-Segmentation-----Analysis--SQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FCustomer-Segmentation-----Analysis--SQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FCustomer-Segmentation-----Analysis--SQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Lock747","download_url":"https://codeload.github.com/Lock747/Customer-Segmentation-----Analysis--SQL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lock747%2FCustomer-Segmentation-----Analysis--SQL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272982390,"owners_count":25025982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["powerbi","sql-server"],"created_at":"2025-01-12T14:17:21.660Z","updated_at":"2025-08-31T12:34:09.693Z","avatar_url":"https://github.com/Lock747.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"### Customer-Segmentation - Analysis--SQL\n\n## Project Overview\n\nThis project demonstrates the use of SQL to perform customer segmentation based on purchasing behavior, demographics, and other relevant data points. Customer segmentation is essential for businesses to personalize marketing, improve customer retention, and optimize product offerings.\n\n\n## Goals\n\t•\tTo segment customers into distinct groups for tailored marketing strategies.\n\t•\tAnalyze customer behavior and identify key patterns.\n\t•\tEnable businesses to make data-driven decisions.\n\n## Key Features\n \n\t1. Data Cleaning and Preprocessing:\n\t  • Handled missing values and duplicates.\n\t  •\tStandardized formats for consistency.\n   \n  2.\tSegmentation Metrics:\n\t  •\tRFM Analysis (Recency, Frequency, Monetary Value).\n\t  •\tDemographic segmentation (age, location, gender).\n\t  •\tBehavioral segmentation (purchase history, product preferences).\n\n\t3.\tSQL Queries:\n\t  •\tComplex joins to merge customer, transactions, and product tables.\n\t  •\tAggregation for RFM scoring.\n\t  •\tCase statements to assign customer categories.\n\n\t4.\tInsights:\n\t  •\tIdentified top-spending customers.\n\t  •\tSegmented customers into tiers (e.g., VIP, Regular, Infrequent).\n\t  •\tHighlighted retention strategies based on data.\n   \n\n# Applications\n\n\t•\tTargeted marketing campaigns.\n\t•\tImproved customer experience through tailored recommendations.\n\t•\tRevenue growth by focusing on high-value segments.\n\nRFM analysis stands for recency, frequency and monetary Value. RFM analysis is a way to use the data based on existing customer behaviour to predict how a new  customer is likely to act in the future. An RFM model is built using three factors:\n\n\t1. How recently customer has transacted with a brand\n\t2. How frequently they've engaged with a brand.\n\t3. How much money they have spend on brands products and services.\n\n```sql\n-- cleaning the data \n\nselect * from Transactions\n\nselect * into a_t from Transactions\n\nselect * from Transactions\nwhere order_status not like 'Approved'\n\ndelete from a_t\nwhere order_status = 'Cancelled'\n\n```\n\n# 1. RFM Analysis\n\n```sql\n\n\ndeclare @today_date as date = '2018-01-01';\n\nwith base as(\n\tselect  \n\t\t customer_id\n\t\t, max(transaction_date) as most_recent_purchase\n\t\t, datediff(day, max(transaction_date), @today_date) as recency_score\n\t\t, count(transaction_id) as frequency_score\n\t\t, sum(list_price) as monetary_value\n\t\t, sum(Profit) as monerary_profit\n\tfrom a_t\n\tGroup by customer_id\n),\n\nrfm_score as (\n\tselect \n\t\tcustomer_id\n\t\t, recency_score\n\t\t, frequency_score\n\t\t, monetary_value\n\t\t, NTILE(5)over(order by recency_score desc) as R \n\t\t, NTILE(5)over(order by frequency_score) as F\n\t\t, NTILE(5)over(order by monetary_value) as M\n\tfrom base\n)\n\nselect \n\t(R +F+M) / 3 as rfm_group\n\t, count(rfm.customer_id) as Customer_count\n\t, cast(sum(monerary_profit) as decimal(16,2)) as rfm_profit\nfrom rfm_score as rfm\ninner join base on base.customer_id  = rfm.customer_id\ngroup by (R +F+M) / 3\norder by (R +F+M) / 3\n\n-- Output\n\nrfm_group\t  Customer_count\t  rfm_profit\n--------    --------------    ----------\n1\t              658\t        ₹ 9,24,710.79\n2\t              862\t        ₹ 19,56,406.08\n3\t              1049\t      ₹ 35,03,591.53\n4\t              765         ₹ 35,47,285.03\n5\t              159\t        ₹ 8,99,510.66\n\n\n```\n\n# 2. Data Analysis and Exploration\n\n# 2.1 New Customer vs Old Customer Age Distributions\n\n```sql\nselect \n\tage\n\t, tenure\nfrom CustomerDemographic\nwhere tenure \u003e=5 -- where tenure \u003c 5\n\n-- Copying the output and visualizing the data in PowerBI gives the below Visualisation \n\n```\n**Old Customer(tenurity above 5 yrs)**\n\n- The lowest age groups are under 20 and 80+ for custoemrs been with us more than 5 years.\n- Among the loyal customers the most populated bracket is the age group 40 - 50.\n- Between the age group 40-70 customers stay most loyal to the Brand.\n\n![image](https://github.com/user-attachments/assets/8d5bc476-a754-4e21-bbf3-46f60fa9f1ef)\n\n\n**New Customers(Tenurity below 5 yrs)**\n\n- Among the New Customers the most populated age bracket is 20-35 and 55- 60.\n- There is a steep drop in number of customers in 30-39 age groupsd among the New Customers.\n- Youngsters are more prone to become a new customer \n\n![image](https://github.com/user-attachments/assets/8faeac47-3eb4-4a2c-933c-830754c865fb)\n\n**2.2 Age and Monetary Analysis with RFM Comparison**\n\n```sql\ndeclare @today_date as date = '2018-01-01';\n\nwith base as(\n\tselect  \n\t\t customer_id\n\t\t, max(transaction_date) as most_recent_purchase\n\t\t, datediff(day, max(transaction_date), @today_date) as recency_score\n\t\t, count(transaction_id) as frequency_score\n\t\t, sum(list_price) as monetary_value\n\t\t, sum(Profit) as monerary_profit\n\tfrom a_t\n\tGroup by customer_id\n),\n\nrfm_score as (\n\tselect \n\t\tcustomer_id\n\t\t, recency_score\n\t\t, frequency_score\n\t\t, monetary_value\n\t\t, NTILE(5)over(order by recency_score desc) as R \n\t\t, NTILE(5)over(order by frequency_score) as F\n\t\t, NTILE(5)over(order by monetary_value) as M\n\tfrom base\n)\n\nselect \n\trfm_score.M\n\t, Age\n\t, monetary_value\n\t, rfm_score.customer_id\nfrom rfm_score inner join CustomerDemographic on rfm_score.customer_id = CustomerDemographic.customer_id\n\n\n```\n\n![image](https://github.com/user-attachments/assets/650f2298-ee72-4c71-ba72-5290731c821c)\n\n- The highest spendings is in the age bracket 40 - 50\n- Followed by the age groups 30-40 and 50-60\n- Younger Custoemrs and older customers tend to spend less\n\n**2.3 Male Female distribution in rfm group**\n\n```sql\n\ndeclare @today_date as date = '2018-01-01';\n\nwith base as(....\n\nrfm_score as (....\n\nwith CTE as (\n\tselect \n\t\t(R+F+M)/3 as RFM_Score\n\t\t, gender\n\tfrom rfm_score join CustomerDemographic on rfm_score.customer_id = CustomerDemographic.customer_id\n)\n\nselect \n\tRFM_Score\n\t, gender\n\t, count(gender) as number_gender\nfrom CTE\ngroup by  RFM_Score,gender\norder by RFM_Score\n\n-- Output\n\nRFM_Score\t    gender\t    number_gender\n---------     ------      -------------\n1\t            Male\t          304\n1            \tFemale\t        340\n2\t            Female\t        443\n2\t            Male\t          405\n3\t            Male\t          506\n3\t            Female\t        522\n4\t            Male\t          369\n4\t            Female\t        377\n5\t            Female\t         77\n5\t            Male\t           72\n\n```\n\n![image](https://github.com/user-attachments/assets/89c1c477-596a-40ba-b6ad-8ab531fd502f)\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flock747%2Fcustomer-segmentation-----analysis--sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flock747%2Fcustomer-segmentation-----analysis--sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flock747%2Fcustomer-segmentation-----analysis--sql/lists"}