{"id":17458526,"url":"https://github.com/cagandemirmr/flo_data_exploration","last_synced_at":"2025-10-12T01:22:21.116Z","repository":{"id":257987375,"uuid":"873173140","full_name":"cagandemirmr/FLO_Data_Exploration","owner":"cagandemirmr","description":"You can find data exploration of FLO","archived":false,"fork":false,"pushed_at":"2024-10-15T19:32:41.000Z","size":26,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-17T07:32:59.449Z","etag":null,"topics":["dataanalysis","dataexploration","queries","sql","sql-server"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cagandemirmr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-15T18:08:52.000Z","updated_at":"2024-10-15T19:54:36.000Z","dependencies_parsed_at":"2024-10-17T07:33:05.046Z","dependency_job_id":null,"html_url":"https://github.com/cagandemirmr/FLO_Data_Exploration","commit_stats":null,"previous_names":["cagandemirmr/flo_data_exploration"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cagandemirmr/FLO_Data_Exploration","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cagandemirmr%2FFLO_Data_Exploration","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cagandemirmr%2FFLO_Data_Exploration/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cagandemirmr%2FFLO_Data_Exploration/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cagandemirmr%2FFLO_Data_Exploration/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cagandemirmr","download_url":"https://codeload.github.com/cagandemirmr/FLO_Data_Exploration/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cagandemirmr%2FFLO_Data_Exploration/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279009737,"owners_count":26084645,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataanalysis","dataexploration","queries","sql","sql-server"],"created_at":"2024-10-18T04:05:38.693Z","updated_at":"2025-10-12T01:22:21.092Z","avatar_url":"https://github.com/cagandemirmr.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# FLO Customer Data Exploration\n\n![image](https://github.com/user-attachments/assets/f1f05852-2803-4aa4-bc22-934a68e191ca)\n\n\nThis project involves the exploration and analysis of customer data from **FLO**, a leading producer and retailer in the Turkish shoe industry. The dataset contains information on customer purchasing behavior across various channels, including both online and offline transactions.\n\n## Dataset Information\n\n- **Total Rows**: 20,000\n- **Total Columns**: 13\n\n### Column Descriptions\n\n1. **Master_ID**: \n   - A unique 36-character identifier composed of numbers and letters that anonymously represents each customer.\n\n2. **Order_Channel**: \n   - The channel used by the customer when placing an order. It consists of the following 4 variables:\n     - iOS App\n     - Android App\n     - Desktop\n     - Mobile\n\n3. **Last_order_channel**: \n   - The channel used by the customer for their most recent order. It includes 5 variables:\n     - iOS App\n     - Android App\n     - Desktop\n     - Mobile\n     - Offline (In-store purchases)\n\n4. **First_order_date**: \n   - The date when the customer placed their first order. The minimum date is **2013-01-14** and the maximum date is **2021-05-27**.\n\n5. **Last_order_date**: \n   - The date when the customer placed their most recent order. The minimum date is **2020-05-30** and the maximum date is **2021-05-30**.\n\n6. **Last_order_date_online**: \n   - The date when the customer made their most recent online order.\n\n7. **Last_order_date_offline**: \n   - The date when the customer made their most recent offline (in-store) order.\n\n8. **Order_num_total_ever_offline**: \n   - The total number of orders placed by the customer through offline channels.\n\n9. **Order_num_total_ever_online**: \n   - The total number of orders placed by the customer through online channels.\n\n10. **Customer_value_total_ever_offline**: \n    - The total revenue generated by the customer from offline purchases.\n\n11. **Customer_value_total_ever_online**: \n    - The total revenue generated by the customer from online purchases.\n\n12. **Interested_in_categories_12**: \n    - This column contains 32 unique values representing categories assigned based on the customer's interests.\n\n13. **Store_type**: \n    - Consists of 3 main store types, with 4 combinations based on customer purchasing habits.\n\n## Steps\n\n### Data Preparation\n\nBefore importing the data into SQL, I made several adjustments in Excel:\n- Formatted date columns to ensure consistency.\n- Removed unnecessary characters and cleaned up the data.\n\nAfter these modifications, I copied the Excel tables into SQL. This approach is efficient for:\n- Familiarizing myself with the data.\n- Data normalization, which reduces the size of the data.\n  \n![image](https://github.com/user-attachments/assets/78473aa7-d728-414b-9fb6-5bf3ec334217)\n\nHowever, while this method improves efficiency for checking and exploring data, it's less efficient in terms of large-scale data usage. Fortunately, the dataset's size can be classified as moderate.\n\n### SQL Queries\n\nTo gain a basic understanding of the data, I wrote the following queries in SQL:\n\n```sql\n-- General overview of the dataset\nSELECT * FROM CUSTOMERS;\n\n-- To observe the minimum and maximum character lengths of Master_ID\nSELECT DISTINCT MIN(LEN(Master_ID)) AS MINIMUM_LENGTH, \n                MAX(LEN(Master_ID)) AS MAXIMUM_LENGTH \nFROM CUSTOMERS \nGROUP BY Master_ID;\n\n-- Distinct values of the Order Channel\nSELECT DISTINCT Order_channel FROM CUSTOMERS;\n\n-- Distinct values of the Last Order Channel\nSELECT DISTINCT Last_order_channel FROM CUSTOMERS;\n\n-- Minimum and maximum Last Order Dates\nSELECT MIN(Last_order_date) AS MIN_LAST_DATE, \n       MAX(Last_order_date) AS MAX_LAST_DATE \nFROM CUSTOMERS;\n\n-- Minimum and maximum First Order Dates\nSELECT MIN(First_order_date) AS MIN_FIRST_DATE, \n       MAX(First_order_date) AS MAX_FIRST_DATE \nFROM CUSTOMERS;\n\n-- Minimum and maximum Last Online Order Dates\nSELECT MIN(Last_order_date_online) AS MIN_LAST_OR_ONLINE_DATE, \n       MAX(Last_order_date_online) AS MAX_LAST_OR_ONLINE_DATE \nFROM CUSTOMERS;\n\n-- Minimum and maximum Last Offline Order Dates\nSELECT MIN(Last_order_date_offline) AS MIN_LAST_OR_OFFLINE_DATE, \n       MAX(Last_order_date_offline) AS MAX_LAST_OR_OFFLINE_DATE \nFROM CUSTOMERS;\n\n-- Minimum and maximum order amounts (offline)\nSELECT MIN(Order_num_total_ever_offline), \n       MAX(Order_num_total_ever_offline) \nFROM CUSTOMERS;\n\n-- Minimum and maximum order amounts (online)\nSELECT MIN(Order_num_total_ever_online), \n       MAX(Order_num_total_ever_online) \nFROM CUSTOMERS;\n\n-- Minimum and maximum customer values (offline)\nSELECT MIN(Customer_value_total_ever_offline), \n       MAX(Customer_value_total_ever_offline) \nFROM CUSTOMERS;\n\n-- Minimum and maximum customer values (online)\nSELECT MIN(Customer_value_total_ever_online), \n       MAX(Customer_value_total_ever_online) \nFROM CUSTOMERS;\n\n-- Distinct values of Interested Categories\nSELECT DISTINCT Interested_in_categories_12 FROM CUSTOMERS;\n\n-- Distinct values of Store Type\nSELECT DISTINCT Store_type FROM CUSTOMERS;\n\n```\nThese queries provided a foundational understanding of the data and helped identify key trends and outliers in customer behavior, order channels, and purchasing patterns\n\n### Analysis Questions\n\nIn the end of variability analysis,some questions appeared in my mind and write querries in SQL.\n\n1. **What is the unique customer number who made a purchase?**\n\n```sql\n  select count(distinct Master_ID) TOTALCUSTOMERS from CUSTOMERS\n\n```\n\n  ![image](https://github.com/user-attachments/assets/320ea439-169b-4982-a425-93cf7da85d95)\n\n\n2. **What is the total number of purchases and the total revenue?**\n\n```sql   \nselect sum(Order_num_total_ever_offline+Order_num_total_ever_online) TOTAL_AMOUNT_OF_SALES,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) REVENUE\nfrom CUSTOMERS\n ```\n\n![image](https://github.com/user-attachments/assets/220819c8-e03d-4bc2-b6f6-7cfa9e9ab1a1)\n\n\n3. **What is the average revenue per purchase?**\n\n```sql   \nselect (sum(Customer_value_total_ever_offline+ Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online) )AVG_REVENUE \nfrom CUSTOMERS\n```\n\n![image](https://github.com/user-attachments/assets/c52d6bb1-06a3-43a2-8b72-024807e9569a)\n\n5. **What is the total revenue and number of purchases made through the last order channel (last_order_channel)?**\n\n```sql   \nselect Last_order_channel, (sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online)) AVG_REVENUE \nfrom CUSTOMERS group by Last_order_channel\n```\n\n![image](https://github.com/user-attachments/assets/10d53509-e590-4d91-8aef-0db219e96ad9)\n\n\n7. **What is the total revenue  by store type?**\n\n```sql  \nselect  Store_type,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE \nfrom CUSTOMERS group by Store_type\n```\n\n![image](https://github.com/user-attachments/assets/de61cff0-639d-4c99-b036-9dc6e70d295f)\n\n\n9. **What is the number of purchases per year based on the customer's first order date (first_order_date)?**\n\n```sql  \nselect year(First_order_date) YEAR_,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_AMOUNT_OF_SALES\nfrom CUSTOMERS group by year(First_order_date) order by 2 desc\n```\n\n![image](https://github.com/user-attachments/assets/7f60a01c-06f5-407c-ba27-d2baf4ccfef3)\n\n\n11. **What is the average revenue per purchase,  by the last order channel (last_order_channel)?**\n\n```sql  \nselect Last_order_channel, sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/sum(Order_num_total_ever_offline+Order_num_total_ever_online) PRODUCTIVITY\nfrom CUSTOMERS  \ngroup by Last_order_channel\n```\n\n![image](https://github.com/user-attachments/assets/0e7b8b3f-f20c-4b3a-9360-e31bcf973b7e)\n\n\n12. **What is the most popular category in the last 12 months?**\n\n```sql  \nselect Interested_in_categories_12,count(*) Frequency\nfrom CUSTOMERS\ngroup by Interested_in_categories_12\norder by 2 desc\n```\n\n![image](https://github.com/user-attachments/assets/84103679-070a-442d-adda-9257b885f9d8)\n\n\n14. **What is the most preferred store type?**\n\n```sql  \nselect top 1 Store_type,count(*) Store_count \nfrom CUSTOMERS group by Store_type\norder by 2 desc\n```\n\n![image](https://github.com/user-attachments/assets/05c803f1-3877-4f02-8460-5938f92d50bd)\n\n\n16. **What is the most popular category and the total amount spent in that category, broken down by the last order channel (last_order_channel)?**\n\n```sql \nSELECT DISTINCT \n    C.Last_order_channel,\n    (SELECT TOP 1 Interested_in_categories_12 \n     FROM CUSTOMERS \n     WHERE Last_order_channel = C.Last_order_channel \n     GROUP BY Interested_in_categories_12\n     ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Top_category,\n    (SELECT TOP 1 SUM(Order_num_total_ever_offline + Order_num_total_ever_online) \n     FROM CUSTOMERS \n     WHERE Last_order_channel = C.Last_order_channel \n     GROUP BY Interested_in_categories_12\n     ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Total_order\nFROM CUSTOMERS C\n```\n\n![image](https://github.com/user-attachments/assets/769d8dc0-8592-4e69-838c-0256aec30613)\n\n\n18. **What is the ID of the customer who made the most purchases?**\n\n```sql \nselect top 1  Master_ID,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_SALES\nfrom CUSTOMERS group by Master_ID\norder by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) desc\n```\n\n![image](https://github.com/user-attachments/assets/40c6edf7-e679-4e89-8e75-9c3063b31afd)\n\n\n20. **What is the average revenue per purchase and the average number of days between purchases (purchase frequency) for the customer who made the most purchases?**\n\n```sql \nselect D.Master_ID,D.TOTAL_REVENUE/D.FREQUENCY AVERAGE_REVENUE_BY_FREQ, D.TOTAL_REVENUE / round(DATEDIFF(day,First_order_date,Last_order_date) ,1) REVENUE_BY_DAY from (select top 1 Master_ID,First_order_date,Last_order_date,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE,\nsum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY  from CUSTOMERS group by Master_ID,First_order_date,Last_order_date\norder by TOTAL_REVENUE desc) D\n```\n\n![image](https://github.com/user-attachments/assets/103150e4-6681-4dd6-b39a-a274dde72530)\n\n\n22. **What is the average number of days between purchases (purchase frequency) for the top 100 customers by total revenue?**\n\n```sql \nselect D.Master_ID,D.First_order_date,D.Last_order_date,D.REVENUE/D.FREQUENCY AVG_REVENUE,DATEDIFF(day,D.First_order_date,D.Last_order_date) DATE_DIF,round(D.REVENUE/ DATEDIFF(day,D.First_order_date,D.Last_order_date),1) DATE_BY_REVENUE \nfrom (select top 100 Master_ID,First_order_date,Last_order_date,\nsum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) \nREVENUE from CUSTOMERS group by Master_ID,First_order_date,Last_order_date order by REVENUE desc) D\n```\n\n![image](https://github.com/user-attachments/assets/e23f31be-65d3-461c-86e2-514e0d300bcf)\n\n24. **Who is the customer who made the most purchases, broken down by the last order channel (last_order_channel)?**\n\n```sql \nselect distinct Last_order_channel,\n(select top 1 Master_ID from CUSTOMERS where Last_order_channel=C.Last_order_channel\ngroup by Master_ID\norder by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)  desc) CUSTOMER, (select top 1 sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline ) \nfrom CUSTOMERS  where Last_order_channel=C.Last_order_channel group by Master_ID order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline ) desc) REVENUE  from CUSTOMERS C\n```\n\n![image](https://github.com/user-attachments/assets/b63dd722-5baa-42f8-9457-5e45d1f82a6c)\n\n\n25. **What is the ID of the last customer who made a purchase (including multiple IDs if there were multiple purchases on the latest date)?**\n\n```sql\nselect Master_ID,Last_order_date from CUSTOMERS \ngroup by Master_ID,Last_order_date\nhaving Last_order_date = (select max(Last_order_date) from CUSTOMERS)\n```\n\n![image](https://github.com/user-attachments/assets/296de4be-e635-406c-9372-592501dd3279)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcagandemirmr%2Fflo_data_exploration","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcagandemirmr%2Fflo_data_exploration","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcagandemirmr%2Fflo_data_exploration/lists"}