{"id":24695366,"url":"https://github.com/saisrivatsat/end-to-end-data-analytics-retail-orders-analysis-python-sql","last_synced_at":"2026-05-07T13:32:39.449Z","repository":{"id":269896003,"uuid":"908787486","full_name":"saisrivatsat/End-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL","owner":"saisrivatsat","description":"This project showcases a comprehensive end-to-end data analytics workflow, encompassing data extraction, cleaning, transformation, and analysis of retail orders. Utilizing Python, SQL, and dynamic visualizations, it uncovers actionable business insights by highlighting critical metrics such as revenue, profit, and regional trends Empowering insight","archived":false,"fork":false,"pushed_at":"2024-12-27T01:33:46.000Z","size":1406,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-22T03:22:36.051Z","etag":null,"topics":["python","sql","visualization"],"latest_commit_sha":null,"homepage":"https://www.kaggle.com/code/sanjusrivatsa9/retail-orders-analysis","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saisrivatsat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-27T01:10:38.000Z","updated_at":"2024-12-27T01:33:49.000Z","dependencies_parsed_at":"2024-12-27T02:23:50.670Z","dependency_job_id":"27900196-8c45-4ba2-8909-18ceb3368b6d","html_url":"https://github.com/saisrivatsat/End-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL","commit_stats":null,"previous_names":["sanju-srivatsa/end-to-end-data-analytics-retail-orders-analysis-python-sql-","saisrivatsat/end-to-end-data-analytics-retail-orders-analysis-python-sql"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/saisrivatsat/End-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saisrivatsat%2FEnd-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saisrivatsat%2FEnd-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saisrivatsat%2FEnd-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saisrivatsat%2FEnd-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saisrivatsat","download_url":"https://codeload.github.com/saisrivatsat/End-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saisrivatsat%2FEnd-to-End-Data-Analytics-Retail-Orders-Analysis-Python-SQL/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269518494,"owners_count":24430629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-08T02:00:09.200Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","sql","visualization"],"created_at":"2025-01-27T00:34:20.024Z","updated_at":"2026-05-07T13:32:39.417Z","avatar_url":"https://github.com/saisrivatsat.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Retail Orders Analysis: End-to-End Data Analytics Project\n\n### Kaggle Notebook:\nhttps://www.kaggle.com/code/sanjusrivatsa9/retail-orders-analysis\n\n## **Introduction**\nThis project is an end-to-end data analytics workflow designed to mimic a real-world business scenario. It demonstrates the Extract, Transform, Load (ETL) process and data analysis to uncover actionable insights from a retail orders dataset.\n\nThe dataset contains attributes such as product details, pricing, regional information, and sales data. The objective is to clean and preprocess the data, perform structured querying using SQL, and create visualizations to answer key business questions.\n\n---\n\n## **Project Objectives**\n### **Data Extraction**\n- Utilize the Kaggle API to download the dataset programmatically.\n- Extract the dataset from its compressed format for further processing.\n\n### **Data Transformation**\n- Clean and preprocess the dataset, handling missing and duplicate values.\n- Derive additional metrics such as profit, discount, and sale price.\n- Normalize column names to a consistent format.\n\n### **Data Loading**\n- Load the transformed dataset into SQLite and MySQL databases for efficient querying.\n- Implement proper database schema design with constraints and indexing for optimized performance.\n\n### **Data Analysis**\n- Use SQL queries to address business questions such as:\n  - Top-performing products by revenue.\n  - Regional sales and profit trends.\n  - Month-over-month sales growth.\n  - High-growth subcategories based on profit.\n- Generate detailed visualizations for insights.\n\n### **Insights and Recommendations**\n- Extract actionable insights and provide business recommendations.\n- Visualize key metrics to support decision-making.\n\n---\n\n## **Technologies Used**\n- **Python**: For data cleaning, preprocessing, and visualization.\n  - Libraries: `pandas`, `sqlalchemy`, `mysql.connector`, `seaborn`, `matplotlib`\n- **SQL**: For querying and analyzing data.\n  - Databases: SQLite and MySQL\n- **Kaggle API**: For automated dataset extraction.\n- **Visualization Tools**: Seaborn and Matplotlib for creating insightful charts.\n\n---\n\n## **Dataset Description**\nThe dataset includes retail orders with the following key attributes:\n- `order_id`: Unique identifier for each order.\n- `order_date`: Date when the order was placed.\n- `ship_mode`: Shipping method used for the order.\n- `segment`: Customer segment (e.g., Consumer, Corporate).\n- `region`: Regional classification of the sales.\n- `category` and `sub_category`: Product categories and subcategories.\n- `sale_price`, `quantity`, `discount`, and `profit`: Metrics for financial analysis.\n\n\n---\n\n## **Workflow Steps**\n\n### **1. Data Extraction**\n- Automated dataset download using the Kaggle API.\n- Decompression of the dataset into a Pandas DataFrame for processing.\n\n### **2. Data Cleaning**\n- **Missing Data Handling:** Filled null `ship_mode` values with \"Unknown.\"\n- **Duplicate Removal:** Dropped duplicate `order_id` entries.\n- **Column Standardization:** Normalized column names for consistency.\n\n### **3. Data Transformation**\n- Computed new metrics for analysis:\n  - **Discount**: Derived from list price and discount percentage.\n  - **Sale Price**: Net price after discount.\n  - **Profit**: Sale price minus cost price.\n- Reformatted `order_date` for ease of querying.\n\n### **4. Data Loading**\n- **SQLite Integration:** Enabled local storage and quick querying.\n- **MySQL Integration:** Facilitated scalable data analysis with optimized schemas.\n\n### **5. Data Analysis**\nUsed SQL to address key business objectives, such as:\n- Identifying high-revenue products and profitable regions.\n- Evaluating the impact of discounts on sales.\n- Tracking sales trends and profitability by month and category.\n\n### **6. Visualization**\n- Generated visualizations to complement SQL insights:\n  - Bar charts for top-performing products and regions.\n  - Line charts for trends in sales growth and discounts.\n\n---\n\n## **Visualizations**\n1. **Top 10 Products by Revenue**:\n   - Bar chart visualizing the products with the highest revenue.\n![image](https://github.com/user-attachments/assets/3608d969-0e1f-42eb-bf50-e27e6e1ee229)\n\n2. **Regional Sales Trends**:\n   - Bar chart showing total sales for each region.\n![image](https://github.com/user-attachments/assets/e10cdbe4-322d-4892-89e5-3d3b61f7cf04)\n\n3. **Month-over-Month Sales Growth**:\n   - Line chart tracking sales trends month by month.\n![image](https://github.com/user-attachments/assets/00ba4443-ef32-4acf-af1a-23b5c458f4bd)\n\n4. **High-Growth Subcategories by Profit**:\n   - Horizontal bar chart showcasing subcategories with the highest profits.\n![image](https://github.com/user-attachments/assets/42f21b02-5ee6-4cd0-b17a-f9b6ee097fc5)\n\n5. **Impact of Discount on Revenue**:\n   - Line chart illustrating the relationship between discount percentages and total revenue.\n![image](https://github.com/user-attachments/assets/84be412e-37bc-44b5-aabb-b6142991b4d4)\n\n6. **Profitability by Region**:\n   - Bar chart highlighting profits generated in each region.\n![image](https://github.com/user-attachments/assets/539fa3c3-639e-4ecb-8c4f-263bbffac637)\n\n\n---\n\n## **SQL File Explanation**\n### **Purpose**\nThe included SQL file is pivotal to this project as it:\n1. **Defines the Database Schema**:\n   - The `retail_orders` table is created with constraints for data integrity, such as:\n     - `order_id` as the primary key.\n     - Default values for specific columns (e.g., `country`, `quantity`).\n   - Indexes are added for performance optimization.\n2. **Answers Business Questions**:\n   - Contains 11 business queries addressing key performance indicators, such as:\n     - Top-performing products by revenue.\n     - Regional profitability.\n     - Month-over-month sales growth.\n3. **Provides Scalability**:\n   - The SQL file can be adapted to analyze other datasets with similar structures.\n\n### **Schema Design**\n- **Primary Key:** Ensures unique `order_id`.\n- **Indexes:** Improve query performance for `order_date`, `region`, and `category`.\n- **Constraints:** Enforce data quality with `NOT NULL` and default values.\n\n## **SQL Queries Used**\n### Example Queries:\n- **Top 10 Products by Revenue**:\n  ```sql\n  SELECT product_id, SUM(sale_price * quantity) AS total_revenue\n  FROM retail_orders\n  GROUP BY product_id\n  ORDER BY total_revenue DESC\n  LIMIT 10;\n  ```\n\u003cimg width=\"1356\" alt=\"image\" src=\"https://github.com/user-attachments/assets/6bb47cf9-fd11-4288-a0b0-0f8f77fbeb76\" /\u003e\n\n- **Regional Sales Trends**:\n  ```sql\n  SELECT region, SUM(sale_price * quantity) AS total_sales\n  FROM retail_orders\n  GROUP BY region\n  ORDER BY total_sales DESC;\n  ```\n\u003cimg width=\"851\" alt=\"image\" src=\"https://github.com/user-attachments/assets/a17150e1-3b99-4c27-9003-1f05eb4d8688\" /\u003e\n\n- **Month-over-Month Sales Growth**:\n  ```sql\n  SELECT DATE_FORMAT(order_date, '%Y-%m') AS month, SUM(sale_price * quantity) AS total_sales\n  FROM retail_orders\n  GROUP BY month\n  ORDER BY month;\n  ```\n\u003cimg width=\"1170\" alt=\"image\" src=\"https://github.com/user-attachments/assets/447849d8-b3a1-43a2-8884-f010bcf42051\" /\u003e\n   ```\n\n---\n\n## **Dataset Attributes**\n- **Order Details:** `order_id`, `order_date`, `ship_mode`, `segment`\n- **Location Information:** `country`, `city`, `state`, `region`\n- **Product Information:** `category`, `sub_category`, `product_id`\n- **Financial Metrics:** `quantity`, `discount`, `sale_price`, `profit`\n\n---\n\n## **Insights and Recommendations**\n\n### **Key Insights**\n1. **Product Performance:**\n   - Specific products consistently generate the highest revenue.\n2. **Regional Trends:**\n   - Regions with strong profitability warrant increased investment.\n3. **Discount Optimization:**\n   - Discounts influence revenue positively but require strategic planning.\n4. **Category Focus:**\n   - Subcategories with high margins offer opportunities for upselling.\n\n### **Recommendations**\n- Prioritize marketing efforts on top-performing products and regions.\n- Implement dynamic discounting strategies to maximize profitability.\n- Focus inventory management on high-demand and high-margin subcategories.\n\n---\n\n## **Future Enhancements**\n1. **Automation**:\n   - Use tools like Apache Airflow to automate ETL workflows.\n2. **Predictive Analysis**:\n   - Incorporate machine learning models for sales forecasting.\n3. **Interactive Dashboards**:\n   - Build dashboards with Tableau or Streamlit for real-time insights.\n4. **Cloud Integration**:\n   - Migrate workflows to cloud platforms for scalability and accessibility.\n\n---\n\n## **Conclusion**\nThis project serves as a robust example of leveraging Python, SQL, and data visualization to solve real-world business problems. It highlights the practical application of data engineering and analytics, making it a valuable resource for aspiring data professionals. By extending the analysis to predictive modeling and cloud integration, this workflow can unlock even greater business value.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaisrivatsat%2Fend-to-end-data-analytics-retail-orders-analysis-python-sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaisrivatsat%2Fend-to-end-data-analytics-retail-orders-analysis-python-sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaisrivatsat%2Fend-to-end-data-analytics-retail-orders-analysis-python-sql/lists"}