{"id":28276818,"url":"https://github.com/ashwinwilson/walmart_sales_data_analysis","last_synced_at":"2026-01-27T02:35:00.968Z","repository":{"id":286254936,"uuid":"960871174","full_name":"AshwinWilson/Walmart_Sales_Data_Analysis","owner":"AshwinWilson","description":"This end-to-end data analytics project focuses on uncovering key business insights from Walmart sales data using Python, Pandas, SQLite, and SQL. It simulates a real-world data analysis pipeline, ideal for aspiring or professional data analysts.","archived":false,"fork":false,"pushed_at":"2025-04-05T12:31:01.000Z","size":1239,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-18T09:49:18.343Z","etag":null,"topics":["python","sql","sqlite3"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AshwinWilson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-05T08:41:23.000Z","updated_at":"2025-04-11T14:34:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"3c21a663-6d21-4f57-ade2-6b70f7ed1232","html_url":"https://github.com/AshwinWilson/Walmart_Sales_Data_Analysis","commit_stats":null,"previous_names":["ashwinwilson/walmart_sales_data_analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AshwinWilson/Walmart_Sales_Data_Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshwinWilson%2FWalmart_Sales_Data_Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshwinWilson%2FWalmart_Sales_Data_Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshwinWilson%2FWalmart_Sales_Data_Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshwinWilson%2FWalmart_Sales_Data_Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AshwinWilson","download_url":"https://codeload.github.com/AshwinWilson/Walmart_Sales_Data_Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshwinWilson%2FWalmart_Sales_Data_Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28796977,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T01:07:07.743Z","status":"online","status_checked_at":"2026-01-27T02:00:07.755Z","response_time":168,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","sql","sqlite3"],"created_at":"2025-05-21T05:11:31.409Z","updated_at":"2026-01-27T02:35:00.963Z","avatar_url":"https://github.com/AshwinWilson.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 🏬 Walmart Sales Data Analysis\n\nThis end-to-end data analytics project focuses on uncovering **key business insights** from Walmart sales data using **Python**, **Pandas**, **SQLite**, and **SQL**. It simulates a real-world data analysis pipeline, ideal for aspiring or professional data analysts.\n\n---\n\n## 🔧 1. Project Environment Setup\n\n**Tools Used:**  \nVisual Studio Code (VS Code), Python 3.8+, SQLite (or optional MySQL/PostgreSQL)\n\n**Goal:**  \nSet up a clean and structured workspace for smooth development and collaboration.\n\n\u003cpre\u003e\nproject/\n│-- data/           # Raw and processed datasets\n│-- notebooks/      # Jupyter Notebooks for EDA and analysis\n│-- sql_queries/    # All SQL scripts used in the project\n│-- README.md       # Project documentation\n│-- requirements.txt# Required Python libraries\n\u003c/pre\u003e\n\n---\n\n## 📥 2. Dataset Acquisition\n\n- **Data Source:** [Walmart Sales Dataset on Kaggle](https://www.kaggle.com/)\n- **Storage:** Save the dataset in the `data/` folder for easy access and versioning.\n\n---\n\n## 📦 3. Install Required Libraries\n\nInstall the necessary Python packages:\n\n```bash\npip install pandas numpy sqlalchemy sqlite3\n```\n\nFor optional MySQL or PostgreSQL support:\n\n```bash\npip install mysql-connector-python psycopg2\n```\n\n---\n\n## 🔍 4. Exploratory Data Analysis (EDA)\n\nUnderstand the dataset's structure and initial quality:\n\n```python\ndf.info()\ndf.describe()\ndf.head()\n```\n\n---\n\n## 🧹 5. Data Cleaning\n\n- **Remove Duplicates:** Eliminated duplicate rows to ensure accurate analysis.\n- **Handle Missing Values:** Dropped or imputed missing values based on significance.\n- **Fix Data Types:** Converted fields like `date` to `datetime`, and `unit_price` to `float`.\n- **Currency Formatting:** Removed `$` signs using regex:\n  ```python\n  df['unit_price'] = df['unit_price'].replace(r'[\\$,]', '', regex=True).astype(float)\n  ```\n- **Validation:** Confirmed data consistency and fixed formatting issues.\n\n---\n\n## ⚙️ 6. Feature Engineering\n\nCreated a `total_amount` column for easier revenue and profit analysis:\n\n```python\ndf['total_amount'] = df['unit_price'] * df['quantity']\n```\n\n---\n\n## 🗃️ 7. Load Cleaned Data into SQLite\n\n```python\nimport sqlite3\nconn = sqlite3.connect(\"walmart_sales.db\")\ndf.to_sql(\"walmart\", conn, if_exists=\"replace\", index=False)\n```\n\n---\n\n## 📊 8. SQL-Based Business Analysis\n\n### 1️⃣ Total Revenue by Branch\n```sql\nSELECT Branch, ROUND(SUM(unit_price * quantity), 2) AS total_revenue\nFROM walmart\nGROUP BY Branch\nORDER BY total_revenue DESC;\n```\n![image](https://github.com/user-attachments/assets/a705b7e7-398b-4bff-be3c-a246bb1308d7)\n\n\n### 2️⃣ Monthly Sales Trend by Category\n```sql\nSELECT strftime('%Y-%m', date) AS month, category,\n       ROUND(SUM(unit_price * quantity), 2) AS total_sales\nFROM walmart\nGROUP BY month, category\nORDER BY month, total_sales DESC;\n```\n\n### 3️⃣ Revenue by Payment Method\n```sql\nSELECT payment_method, COUNT(*) AS total_transactions,\n       ROUND(SUM(unit_price * quantity), 2) AS total_sales\nFROM walmart\nGROUP BY payment_method\nORDER BY total_sales DESC;\n```\n\n### 4️⃣ Average Transaction Value per Branch\n```sql\nSELECT Branch, ROUND(AVG(unit_price * quantity), 2) AS avg_transaction_value\nFROM walmart\nGROUP BY Branch\nORDER BY avg_transaction_value DESC;\n```\n\n### 5️⃣ Estimated Profit by Category\n```sql\nSELECT category,\n       ROUND(SUM(unit_price * quantity * profit_margin), 2) AS estimated_profit\nFROM walmart\nGROUP BY category\nORDER BY estimated_profit DESC;\n```\n![image](https://github.com/user-attachments/assets/f3d1699c-dd85-42e4-a144-89a58f63c135)\n\n\n### 6️⃣ Peak Hours by Quantity Sold\n```sql\nSELECT strftime('%H', time) AS hour,\n       SUM(quantity) AS total_quantity\nFROM walmart\nGROUP BY hour\nORDER BY total_quantity DESC;\n```\n![image](https://github.com/user-attachments/assets/374fcf18-2765-4aea-91ab-91c68e3f4a65)\n\n\n### 7️⃣ Top-Selling Cities\n```sql\nSELECT City,\n       ROUND(SUM(unit_price * quantity), 2) AS total_sales\nFROM walmart\nGROUP BY City\nORDER BY total_sales DESC;\n```\n![image](https://github.com/user-attachments/assets/d4e82466-7257-4e00-a917-f5490b87ca74)\n\n\n### 8️⃣ High-Rated Categories\n```sql\nSELECT category,\n       ROUND(AVG(rating), 2) AS avg_rating\nFROM walmart\nGROUP BY category\nORDER BY avg_rating DESC;\n```\n\n---\n\n## 📈 9. Results and Insights\n\n### 🔹 **Sales Insights**\n- Identified **top-performing product categories** by total revenue.\n- Found branches with the **highest overall sales**, indicating key performing regions.\n- Analyzed **preferred payment methods** across customer demographics.\n\n### 🔹 **Profitability**\n- Estimated **profit margins by category and location**.\n- Pinpointed the **most profitable store branches** for strategic investment.\n\n### 🔹 **Customer Behavior**\n- Tracked **customer satisfaction** through average ratings.\n- Highlighted **peak shopping hours** to optimize staffing and inventory.\n- Identified **payment preferences** across branches and customer segments.\n\n---\n\n## 🚀 10. Future Enhancements\n\n- **Interactive Dashboard**: Integrate Power BI or Tableau for real-time visual insights.\n- **Data Enrichment**: Include customer profiles, marketing campaign data, and seasonal trends.\n- **Automation**: Build an **automated data pipeline (ETL)** to support real-time analysis.\n\n---\n\n## ✅ Outcome\n\nThis project simulates the work of a real-world **data analyst** — from raw data ingestion and cleaning to SQL-based business intelligence reporting. It's ideal for showcasing analytical thinking, SQL querying, data preprocessing, and insight generation in a business context.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashwinwilson%2Fwalmart_sales_data_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashwinwilson%2Fwalmart_sales_data_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashwinwilson%2Fwalmart_sales_data_analysis/lists"}