{"id":30360477,"url":"https://github.com/pavelgrigoryevds/olist-deep-dive","last_synced_at":"2026-05-07T10:36:18.604Z","repository":{"id":310465431,"uuid":"1039834335","full_name":"PavelGrigoryevDS/olist-deep-dive","owner":"PavelGrigoryevDS","description":"🌊 Deep Sales Analysis of Olist E-Commerce:  EDA | Time Series| Viz | RFM | NLP | Geospatial | Segmentation \u0026 Actionable Business Recommendations.","archived":false,"fork":false,"pushed_at":"2025-08-18T10:01:52.000Z","size":75794,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-18T10:20:35.406Z","etag":null,"topics":["business-recommendations","clusterization","data-analysis","data-analytics","data-science","deep-analysis","e-commerce","eda","feature-engineering","geospatial","jupyter-notebook","nlp","pandas","plotly","preprocessing","python","rfm","statistics","time-series","visualization"],"latest_commit_sha":null,"homepage":"https://pavelgrigoryevds.github.io/olist-deep-dive/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PavelGrigoryevDS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-18T03:54:31.000Z","updated_at":"2025-08-18T10:01:55.000Z","dependencies_parsed_at":"2025-08-18T10:32:17.726Z","dependency_job_id":null,"html_url":"https://github.com/PavelGrigoryevDS/olist-deep-dive","commit_stats":null,"previous_names":["pavelgrigoryevds/olist-deep-dive"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/PavelGrigoryevDS/olist-deep-dive","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PavelGrigoryevDS%2Folist-deep-dive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PavelGrigoryevDS%2Folist-deep-dive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PavelGrigoryevDS%2Folist-deep-dive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PavelGrigoryevDS%2Folist-deep-dive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PavelGrigoryevDS","download_url":"https://codeload.github.com/PavelGrigoryevDS/olist-deep-dive/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PavelGrigoryevDS%2Folist-deep-dive/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271166845,"owners_count":24710580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business-recommendations","clusterization","data-analysis","data-analytics","data-science","deep-analysis","e-commerce","eda","feature-engineering","geospatial","jupyter-notebook","nlp","pandas","plotly","preprocessing","python","rfm","statistics","time-series","visualization"],"created_at":"2025-08-19T14:22:50.360Z","updated_at":"2026-05-07T10:36:18.599Z","avatar_url":"https://github.com/PavelGrigoryevDS.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🌊 Deep Sales Analysis of Olist Marketplace\n\n[![Python](https://img.shields.io/badge/Python-3.11+-blue?logo=python)](https://github.com/PavelGrigoryevDS/olist-deep-dive/blob/main/pyproject.toml)\n[![Web Report](https://img.shields.io/badge/🌐_Web_Report-blue?logoColor=white)](https://pavelgrigoryevds.github.io/olist-deep-dive/)\n[![Tableau](https://img.shields.io/badge/📊_Tableau_Dashboard-254E6B?logo=tableau\u0026logoColor=white)](https://public.tableau.com/app/profile/pavel.grigoryev/viz/Inwork/PageSales)\n[![Open In Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/code/pavelgrigoryev/deep-sales-analysis-eda-viz-rfm-nlp-geo)\n[![Presentation](https://img.shields.io/badge/Slides-Google%20Slides-red)](https://docs.google.com/presentation/d/1sOYi3MWXedIEnuSn41H8lBeZ9aGnnTi5iV-DEMbfCvc/present)\n[![Jupyter Notebook](https://img.shields.io/badge/-Notebook-F37626?logo=jupyter\u0026logoColor=white)](https://github.com/PavelGrigoryevDS/olist-deep-dive/blob/main/olist_deep_dive/olist_deep_dive.ipynb)\n[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\nComprehensive analysis of Brazilian e-commerce data, uncovering key insights and actionable business recommendations.\n\n**📖 For comfortable reading:** [Web version](https://pavelgrigoryevds.github.io/olist-deep-dive/)\n\n\u003ca id=\"contents\"\u003e\u003c/a\u003e\n\n## 📑 Contents\n\n- [🔗 Project Resources](#project-resources)\n- [🛠️ Tech Stack \\\u0026 Methods](#tech-stack-methods)\n- [📌 Project Overview](#project-overview)\n- [🗃️ Data Source](#data-source)\n- [🎯 Main Conclusions](#main-conclusions)\n- [✨ Key Recommendations](#key-recommendations)\n- [🚀 How to Run This Project](#how-to-run-this-project)\n\n---\n\n\u003ca id=\"project-resources\"\u003e\u003c/a\u003e\n\n## 🔗 Project Resources\n\n- **[Tableau Dashboard](https://public.tableau.com/app/profile/pavel.grigoryev/viz/Inwork/PageSales)** - Olist Performance dashboard\n- **[Presentation Slides](https://docs.google.com/presentation/d/1sOYi3MWXedIEnuSn41H8lBeZ9aGnnTi5iV-DEMbfCvc/present)** - Key findings summary  \n- **[Kaggle Notebook](https://www.kaggle.com/code/pavelgrigoryev/deep-sales-analysis-eda-viz-rfm-nlp-geo)** - Kaggle-integrated version\n- **[Source Notebook](https://github.com/PavelGrigoryevDS/olist-deep-dive/blob/main/olist_deep_dive/olist_deep_dive.ipynb)** - Raw Jupyter notebook *(code only, no outputs)*  \n\n[⬆ back to top](#contents)\n\n---\n\n\u003ca id=\"tech-stack-methods\"\u003e\u003c/a\u003e\n\n## 🛠️ Tech Stack \u0026 Methods\n\n**Stack:**\n\n- **Data Analysis:** `Python` `Pandas` `NumPy`\n- **Visualization:** `Plotly` `Tableau`\n- **Statistics \u0026 ML:** `StatsModels` `SciPy` `Sklearn` `Pingouin`\n- **NLP \u0026 Text Processing:** `NLTK` `TextBlob`\n\n**Methods**:\n\n- **Exploratory Data Analysis (EDA):** \n  - Statistical summaries, missing value analysis, and outlier detection\n- **Data Preprocessing:** \n  - Feature engineering, missing value handling, and creation of new metrics and dimensions\n- **Time Series Analysis:** \n  - Revenue/order trends, seasonality decomposition\n- **RFM Segmentation:** \n  - Customer value clustering (Recency, Frequency, Monetary)\n- **Clustering:** \n  - sklearn-based customer behavior segmentation  \n- **Geospatial Analysis:** \n  - Sales heatmaps and delivery performance by region\n- **NLP Sentiment Analysis:** \n  - Review text processing with NLTK and TextBlob\n- **Statistical Testing:** \n  - correlation analysis and hypothesis testing\n  \n[⬆ back to top](#contents)\n\n---\n\n\u003ca id=\"project-overview\"\u003e\u003c/a\u003e\n\n## 📌 Project Overview\n\nOlist is a Brazilian e-commerce platform that connects sellers and buyers, offering a wide range of products and convenient conditions for online sales. Olist also acts as an intermediary, allowing sellers to connect to multiple marketplaces simultaneously, thereby increasing their reach.\n\nThis analysis aims to:  \n\n- **Evaluate sales performance**  \n   - Identify geographical trends and seasonal patterns  \n   - Analyze revenue fluctuations and growth metrics  \n\n- **Understand customer behavior**  \n   - Examine purchasing frequency and retention drivers  \n   - Segment buyers by value and payment preferences  \n\n- **Assess operational efficiency**  \n   - Map delivery timelines and bottleneck correlations  \n   - Evaluate carrier performance across regions  \n\n- **Optimize payment systems**  \n   - Compare payment method success rates  \n   - Identify risk factors for cancellations  \n\n- **Generate actionable insights**  \n   - Develop data-backed recommendations for business growth  \n   - Propose customer experience improvements  \n\n[⬆ back to top](#contents)\n\n---\n\n\u003ca id=\"data-source\"\u003e\u003c/a\u003e\n\n## 🗃️ Data Source\n\nThe analysis uses the **Olist Brazilian E-Commerce Dataset** ([Kaggle](https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce)) \n\n[⬆ back to top](#contents)\n\n---\n\n\u003ca id=\"main-conclusions\"\u003e\u003c/a\u003e\n\n## 🎯 Main Conclusions\n\n### Sales Trends:\n  - **Growth \u0026 Stabilization**: Sales volume and revenue grew until 2018, then stabilized at 6–7K orders and 1–1.2M R$ per month.\n  - **Black Friday (11/24/2017)**: Record spikes in orders, revenue, and buyers.\n  - **Geography**: Sao Paulo dominates (42% of sales), with steady growth in 2018, unlike other regions.\n\n### Customer Behavior:\n\n  - **Low Retention**: 97% of buyers made only one purchase; repeat buyers are rare.\n  - **High-Value Buyers**: Clients using installment plans (50%) spend 2x more (higher average order value/weight).\n  - **Loyalty**: Promoters (58% of buyers) leave positive reviews but rarely return. Critics (13%) spend more but churn faster due to delivery delays.\n\n### Operational Insights:\n\n  - Delayed orders correlate with lower ratings (avg. rating: 1–2 vs. 4–5 for on-time).\n  - Heavy/expensive orders take longer to deliver and are more likely to be delayed.\n  - Orders with installments process faster, have higher AOV, and show better retention.\n\n### Payment \u0026 Risk:\n\n  - **Credit cards dominate**: 74% of transactions, with 35% higher AOV vs. other methods.\n  - **Installments boost value**: Orders with installments have 2x higher AOV (premium/heavy items).\n  - **Voucher Payments**: Orders paid with vouchers have 3x higher cancellation rates (16% vs. 5% for credit cards).\n\n### Product \u0026 Logistics:\n\n  - **Top Categories**: Electronics (27% of sales) and furniture (18%) drive revenue.\n  - Northern states take 2x longer delivery.\n  - Heavy orders (+40% delivery time) and premium items face delays.\n  - **Delivery Bottlenecks**: 70% of total delivery time is spent with carriers, notably slower in Rio de Janeiro and Salvador.\n\n### Critical Challenges:\n\n  - **Declining Ratings**: Average review scores dropped from 4.5 (2017) to 3.9 (2018), linked to delivery delays.\n  - **Peak Season Failures**: Black Friday 2017 caused a surge in delayed deliveries, with complaints tied to carrier handoff bottlenecks.\n  - **Abandoned Carts**: Canceled orders spike in February/August 2018, often for high-value items paid via vouchers.\n\n### Customer Feedback \u0026 Ratings:\n\n  - **Majority of reviews are positive**: 58% of reviews received a rating of 5. Only 12% of reviews received a rating of 1, while a mere 3% received a rating of 2.\n  - **Negative Reviews**: 15% of review text mentions \"slow delivery\" or \"missing items,\" heavily impacting NPS.\n\n### Data Highlights:\n\n  - **Negative Feedback Drivers**: Low ratings correlate with longer delivery times, higher order value, and heavier items.\n  - **Success Factors**: Fast carrier handoff (≤3 days) and installment options boost ratings and repeat purchases.\n\n[⬆ back to top](#contents)\n\n---\n\n\u003ca id=\"key-recommendations\"\u003e\u003c/a\u003e\n\n## ✨ Key Recommendations\n\n### Boost Customer Retention \u0026 Repeat Purchases:\n\n  - Launch a loyalty program targeting one-time buyers (97% of customers), offering discounts on second purchases or bonus points.\n  - Personalized win-back campaigns for high-value clients (top 1% driving 15% of revenue) with exclusive offers.\n  - Reduce time between purchases (currently 29+ days for 50% of repeat buyers) via time-bound promotions (e.g., \"7-day discount\").\n  \n### Improve Product \u0026 Pricing Strategy:\n\n  - Expand \"Beauty \u0026 Health\" and \"Home \u0026 Garden\" (18% YoY growth categories) with curated bundles or subscriptions.\n  - Reprice problem categories (e.g., \"Watches \u0026 Gifts\") to offset delivery costs or offer guaranteed faster shipping.\n\n### Enhance High-Value Segments:\n\n  - Premium installment plans for big spenders (avg. 3+ orders) with perks like free shipping or priority support.\n  - Target voucher users (3x higher cancellation risk) with limited-time combo deals to convert abandoned carts.\n\n### Fix Delivery Pain Points:\n\n  - Prioritize carrier performance in critical regions (e.g., Rio de Janeiro, Salvador), where delays are 30% longer than average.\n  - Expedite high-value/heavy orders (\u003e500 R$ or \u003e10kg), which face 2x more 1-star ratings due to delays.\n  - Optimize Black Friday logistics to prevent repeat of 2017’s 4x surge in delays (pre-stock inventory, add temporary carriers).\n\n### Regional Growth Tactics:\n\n  - Hyper-local campaigns in São Paulo (42% of sales): Leverage its 20% faster delivery and 30% higher retention to test scalable models.\n  - Fix underperformers (e.g., Maranhão, Ceará) with subsidized shipping or partner pickup points.\n\n### Mitigate Negative Reviews:\n\n  - Automate compensation for delayed orders (e.g., 10% off next purchase if delivery exceeds 15 days).\n  - Sunday support surge: Add staff to cut response times, reducing low weekend ratings.\n\n[⬆ back to top](#contents)\n\n---\n\n\u003ca id=\"how-to-run-this-project\"\u003e\u003c/a\u003e\n\n## 🚀 How to Run This Project\n\n### Prerequisites\n\n- Python 3.11+ installed\n- Git (for cloning)\n\n### Clone the Repository\n\n```bash\ngit clone https://github.com/PavelGrigoryevDS/olist-deep-dive.git\ncd olist-deep-dive\n```\n\n### Install and Run\n\n- If Poetry is NOT installed on your system, use Option 1\n- If Poetry IS installed, use Option 2\n\n#### Option 1: Using pip + virtualenv\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate  \npip install poetry\npoetry config virtualenvs.in-project true --local\npoetry install\njupyter lab olist_deep_dive/olist_deep_dive.ipynb\n```\n\n#### Option 2: Poetry\n\n```bash\npoetry config virtualenvs.in-project true --local\npoetry install\npoetry run jupyter lab olist_deep_dive/olist_deep_dive.ipynb\n```\n\n[⬆ back to top](#contents)\n\n## 📜 License  \n\nThis analysis is shared under [MIT License](LICENSE).  \nOriginal data from Olist remains under its [Kaggle license](https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpavelgrigoryevds%2Folist-deep-dive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpavelgrigoryevds%2Folist-deep-dive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpavelgrigoryevds%2Folist-deep-dive/lists"}