{"id":19450742,"url":"https://github.com/revogati/ecommerce_consumer_behaviour","last_synced_at":"2026-04-16T00:31:16.340Z","repository":{"id":206489637,"uuid":"715980858","full_name":"REVOgati/Ecommerce_Consumer_Behaviour","owner":"REVOgati","description":"This is a Full Data Analytics project From data cleaning, preparation, exploration, Interpretation of insights up to Presentation of findings and recommendations..","archived":false,"fork":false,"pushed_at":"2024-02-23T13:58:11.000Z","size":810,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-25T09:24:55.333Z","etag":null,"topics":["data-analysis","data-exploration","ecommerce","jupyter-notebook","python","sql","tableau-public","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/REVOgati.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-08T08:23:15.000Z","updated_at":"2024-02-23T14:16:48.000Z","dependencies_parsed_at":"2024-11-10T16:39:49.391Z","dependency_job_id":"648eed65-682a-4788-88de-1ce5a8e4e945","html_url":"https://github.com/REVOgati/Ecommerce_Consumer_Behaviour","commit_stats":null,"previous_names":["revogati/ecommerce_consumer_behaviour"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/REVOgati/Ecommerce_Consumer_Behaviour","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/REVOgati%2FEcommerce_Consumer_Behaviour","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/REVOgati%2FEcommerce_Consumer_Behaviour/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/REVOgati%2FEcommerce_Consumer_Behaviour/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/REVOgati%2FEcommerce_Consumer_Behaviour/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/REVOgati","download_url":"https://codeload.github.com/REVOgati/Ecommerce_Consumer_Behaviour/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/REVOgati%2FEcommerce_Consumer_Behaviour/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31866218,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"ssl_error","status_checked_at":"2026-04-15T15:24:39.138Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-exploration","ecommerce","jupyter-notebook","python","sql","tableau-public","visualization"],"created_at":"2024-11-10T16:38:46.882Z","updated_at":"2026-04-16T00:31:16.303Z","avatar_url":"https://github.com/REVOgati.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **Analysis of Consumer Behaviour in an Ecommerce Grocery Business**\n\n## Table of Contents\n\n1. [Project Overview](#project-overview)\n2. [Data Sources](#data-sources)\n3. [Tools Used](#tools-used)\n4. [Data Cleaning and Preparation](#data-cleaning-and-preparation)\n5. [Exploratory Data Analysis](#exploratory-data-analysis)\n    - [Customer Segmentation](#1-customer-segmentation)\n    - [Product Purchase Insights](#2-product-purchase-insights)\n    - [Time-related Patterns](#3-time-related-patterns)\n    - [Cart Analysis](#4-cart-analysis)\n6. [Data Visualization](#data-visualization)\n7. [Findings](#findings)\n8. [Limitations and Assumptions](#limitations-and-assumptions)\n9. [Recommendations to Ecommerce Store and Conclusion](#recommendations-to-ecommerce-store-and-conclusion)\n10. [References](#references)\n\n---\n\n\n### **PROJECT OVERVIEW**\n\t\n**Bussiness Problem** : Improving customer retention and determining product purchase patterns.\n\n**Description:**\n- The dataset contains information about customer orders, including the day of the week, hour of the day, and days since the prior order.\n\n - We can use this data to analyze customer behavior and identify patterns related to purchase of different products.\n\n - The goal is to determine product purchase and re-ordering patterns in relation to consumer behaviour.\n\n - Therefore, the analysis will consists of the following target areas: \n   \n\t**1. Customer Segmentation:**\\\n \t\t\t- What are the distinct customer segments?\\\n        \t- How can we categorize customers as new or existing?\\\n\t\t\t-Purporse : To determine the general nature of the Online Store's Users.\n   \n\n\t**2.Product purchase Insights:**\\\n\t\t\t- Identify which products have the highest sales and re-ordering rate and vice-versa.\\\n\t\t\t- Purporse : To identify the best performing and least performing products, and draw reasons behind this, to help in improving the business sales.\n\n\t**3. Time-related Patterns:**\\\n\t\t\t- Explore the order time of the day and day of the week to identify peak shopping times.\\\n \t\t\t- Calculate which time of the 24hour clock are most orders made, and which ones.\\\n\t\t\t-Purporse: To help in human resource scheduling and iventory planning.\n\n\n\t**4. Cart Analysis:**\\\n\t\t    - Items per Cart: Explore the number of products added to a cart per order.\\\n\t\t\t- Cart Abandonment: Identify instances where products are added to the cart but not purchased.\\\n\t\t\t- Purporse: Determine the rate of product abandonment and how to reduce this.\n\n\n### **Data Sources**\n\n- The dataset was sourced from a Kaggle account[Click here to view account](https://www.kaggle.com/hunter0007)\n\n- The dataset is from a real E-commerce grocery store called ; 'Hunter's e-grocery for their orders in a given time period in the year 2023 as explained [here](https://www.kaggle.com/datasets/hunter0007/ecommerce-dataset-for-predictive-marketing-2023/data).\n\n###\t**Tools Used**\n\n- **Cleaning and Preparation:** Python programming language was used for data cleaning and preparation. Python is powerful and efficient for handling large datasets.\n\n- **Exploratory Data Analysis:** Python and SQL were utilized for data analysis and insights extraction.\n\n- **Visualizations:** Tableau was used to effectively display findings.\n\n- **Presentation:** Google Slides were employed to create a simplified explanation of the entire process, ensuring easy understanding even for those unfamiliar with data analysis.\n\n\n\n### **Data Cleaning and Preparation**\n\n   **Overview of the data**:\n\nThe dataset contains the following columns:\n\n\t- order_id - unique identity of order\n\n   - user_id -unique identity of user/customer\n\n   - order_number - Number OF THE ORDER\n\n   - order_dow -Day of the Week the order was made(either 0,1,2,3,4,5,6) - '0' represents Monday and '6' represents Sunday\n\n   - order_hour_of_day - Hour of the day order was made\n\n   - days_since_prior_order - Days since prior order ; 0 for new customers, the rest depending on last day\n\n   - product_id - unique ID of product that is part of an order\n\n   - add_to_cart_order - Number of specific products added to cart as part of the order\n\n   - reordered - If the re-order took place ( is in binary of 0 or 1)\n\n   - department_id - specific department identity that an ordered product is part of.\n\n   - department - name of department\n\n   - product_name - name of product\n\n**Raw dataset**\n\t- For detailed information: [Click here to view and download original dataset](https://drive.google.com/file/d/1-6CzS3g7AOjxRwwndnUUHWhF5_Bec5Xn/view?usp=sharing)\n\nThe following are the steps in my data cleaning process:\n\n\n**Clean dataset**\n\t- After cleaning and preparation process, the cleaned dataset was also uploaded in my Google Drive: [Click here to view and download clean dataset](https://docs.google.com/spreadsheets/d/1Fyxl9P_ApXpDl7CixTMg-Z0m8FMBve5D/edit?usp=drive_link\u0026ouid=102527141955837924247\u0026rtpof=true\u0026sd=true)\n\n\n### **Exploratory Data Analysis**\n\n- #### **1. Customer Segmentation:**\n\t- In the original dataset, the order_id column has duplicate values.\\\n\t- This is because every order was brocken down to individual products contained in the specific order, leading to a repeat of a unique order.\\\n\t- Thus, I combined the products in respective order batches in order to handle duplicate values.\n\t\t\n    - With this, I eliminated data duplication and was able to analyze the different groups of customers.\\\n\t\t- Click to view the SQL file: [SQL_Consumer_Segmentation] (https://github.com/REVOgati/Ecommerce_Consumer_Behaviour/blob/155c3faa481e21f0e32490816ee21f0323d13587/SQL_Consumer_Segmentation.sql) \n\n\n\n\t- #### **2. Product purchase Insights:**\n\t\t1. Which products are most ordered in specific departments in the dataset? - To determine the most and least purchased products in every department.\\\n\n\t\t\t- I used SQL to group products by departments, and return the total number of orders of each product.\\\n\n\t\t\t-A snapshot of the SQL code and the link:[Product purchase insights](https://github.com/REVOgati/Ecommerce_Consumer_Behaviour/blob/699fffc3648a992227998da47087a37e1e340930/sql_exploration_files/products_insights.sql)\n\n\t\t2. Which products are most re-ordered and vice-versa, and why? - To determine which products attract most purchases and vice-versa.\n\n\t- #### **3. Time-related Patterns:**\n\t\t- At what time of the 24hour day, are the highest orders made? - To help in human resource allocation\\\n\t\t- At what day of the week are the highest orders made?\\\n\t\t- What time of the day, is the most purchased products mostly ordered?\n\n\t\t1. I grouped the items according to the values of the 24hour day to determine the time with highest orders.\\\n\t\t2. Since 0 represents Monday, and 6 represents Sunday, I used SQL to return number of orders for every day.\\\n\t\t3. I used an SQL code that returns the most ordered product, and its different purchase time periods.\n\n\n\n\t- #### **4. Cart Analysis:**\n\t\t- For every ordered item, I subtracted the number in the cart - order number to determine number of abandoned items and multiplied by 100.\n\t\t- This in order to get the percentange rate of cart abandonment.\n\t\t- I then stored the result in a new column 'abandoned_items'  and counted the total number per product.\n\t\t\n\n- ### **Data Visualization**\n\t- Visual representations are available on [Tableau Public](https://public.tableau.com/app/profile/gareth.tirop/viz/EcommerceStoreAnalysis_17080716552270/EcommerceStoreOrdersAnalysis).\n\n### **Findings**\n\n- **Consumer Segmentation**\n\t- There 103,761 unique orders made.\n\t- Approximately 8,405 entries with value 0 in days_since_prior_order, indicating new users.\n\t- Majority of customers are returning users.\n\n- **Product purchase patterns**\n\t\t- The most top six most ordered products are 'fresh fruits', 'fresh vegetables','packaged vegetables' 'fruits', 'youghurt', 'milk' and 'packaged cheese' respectively.\\\n\t\t- These six fall under the departments of either 'produce' or 'dairy eggs', each having three items.\\\n\n\t\t- The least ordered products are 'frozen juice', 'shave needs', 'beauty', 'first aid', 'eye ear care' and 'kitchen supplies' respectively.\\\n\t\t- Four of these items fall under the department 'personal care' with 'frozen' and 'household' having each one.\n\n- **Time-related patterns**\n\t\t1.  - The hours 10,11,14,15,13,12 have the highest number of orders.\\\n\t\t\t-The hours 3,4,2,5,1,0 have the least number of orders.\\\n\t\t\t- The above data can inform human resource allocation in the recommendations section below.\\\n\n\t\t2.  - The highest number of orders are made on Friday while the least being made on Monday.\\\n\t\t\t- This result may be attributed to factors only unique to the specific grocery store as much cannot be explained based of a global point of view.\\\n\t\t\t\n\n\t\t3.\t- The most ordered product is fresh fruits.\n\n\t\t\t- The time period 10am to 3pm has the highest number of orders for fresh fruits\\\n\t\t\t- while the time period of 12am to 5am has the least number of orders for fresh fruits.\\\n\n- **Cart Analysis**\n\t\t- All results were either less than 1% or a negative for the percentage rate of cart abandonment.\\\n\t\t- This means that the percentage of users abandoning items they add to Cart is extremely low.\\\n\t\t- Therefore, this is not a problem for this Ecommerce Store.\\\n\t\t- An assumption may be, since most users are returning customers, they are well aware of the satisfying quality of the products they order.\\\n\t\t- Or, due to essential nature of grocery items being sold.\n\t\t\n ### **Limitations and Assumptions**\n\t1. There were many unique USER_IDs that had more than 1 occurence of the value 0 in the days_since_prior_order column.\\\n\t\t-I assumed they may have made more than 1 order in their first day as the only possible explanation. \\\n\t\t- I used filtering and grouping techniques to handle this and get the correct number of new users in the period.\\\n\n\t2. The data is collected over a long period of time, however the limit is set to 30 days.\\\n\t\t- Therefore, those users that only bought once in this period are given the value 30 in the 'days_since_prior_order' column.\\\n\t\t- Therefore, those with 30 in the said column are not new users, but already established users,that have not ordered in a long period.\\\n\n### **Recommendations to Ecommerce Store and Conclusion**\n1. **On products insights**\\\n\t- Given that the departments 'produce' or 'dairy eggs' have the highest orders, I would recommend measures to ensure consistency of quality of these products as well as their availability.\\\n\t- Given that the departments 'personal care', 'frozen' and 'household' have the lowest orders, measures should be put in place to promote products of these departments.\\\n\t- They might include improvement of product quality, research from users, advertisements or discounts.\n\n2. **Time-related**\n\t- I would recommend that there should be a significantly higher number of human resource allocation of the employees in the period 10am - 3pm due to the higher number of orders.\n\t - Thus, there should be a smaller group at the hours 12am to 5am due to the lower number of orders.\n\t - This helps to balance the number of human resource against traffic in order to ensure optimality in performance.\n\n3. **Consumer Segementation**\n\t- 8% of all the customers are new users while the rest are exisiting users.\n\t- This means that a very huge number of customers are returning.\n\t- I would recommend surveys to understand reasons for returning customers.\n\n4. **Cart Analysis**\n\t- The store does not have a major problem with cart abandonment as the average rate is at a percentage of less than 1%.\n\t- This might be due to the essential nature of most of its products as groceries and household items are always very readily needed.\n\n\n\n### **References**\n\n- **Visualizations** : Path to my Tableau Dashboard for this project : [Ecommerce Store Viz](https://public.tableau.com/app/profile/gareth.tirop/viz/EcommerceStoreAnalysis_17080716552270/EcommerceStoreOrdersAnalysis)\n\n- **Data Source** :\n\t- The dataset was sourced from a Kaggle account[Click here to view account](https://www.kaggle.com/hunter0007)\n\n- The dataset is from a real E-commerce grocery store called ; 'Hunter's e-grocery for their orders in a given time period in the year 2023 as explained [here](https://www.kaggle.com/datasets/hunter0007/ecommerce-dataset-for-predictive-marketing-2023/data).\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frevogati%2Fecommerce_consumer_behaviour","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frevogati%2Fecommerce_consumer_behaviour","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frevogati%2Fecommerce_consumer_behaviour/lists"}