https://github.com/amlanmohanty1/zepto-sql-data-analysis-project
Complete Data Analysis on Zepto Inventory data using SQL
https://github.com/amlanmohanty1/zepto-sql-data-analysis-project
data-analysis database inventory-management postgresql sql zepto
Last synced: about 2 months ago
JSON representation
Complete Data Analysis on Zepto Inventory data using SQL
- Host: GitHub
- URL: https://github.com/amlanmohanty1/zepto-sql-data-analysis-project
- Owner: amlanmohanty1
- License: mit
- Created: 2025-06-14T19:49:04.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-14T19:54:06.000Z (about 1 year ago)
- Last Synced: 2025-06-14T20:42:57.011Z (about 1 year ago)
- Topics: data-analysis, database, inventory-management, postgresql, sql, zepto
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π Zepto E-commerce SQL Data Analyst Portfolio Project
This is a complete, real-world data analyst portfolio project based on an e-commerce inventory dataset scraped from [Zepto](https://www.zeptonow.com/) β one of Indiaβs fastest-growing quick-commerce startups. This project simulates real analyst workflows, from raw data exploration to business-focused data analysis.
This project is perfect for:
- π Data Analyst aspirants who want to build a strong **Portfolio Project** for interviews and LinkedIn
- π Anyone learning SQL hands-on
- πΌ Preparing for interviews in retail, e-commerce, or product analytics
# **π₯ Watch this [YouTube video](https://www.youtube.com/watch?v=x8dfQkKTyP0&list=PLAx-M6Di0SisFJ1rv5M_FRHUlGA5rtUf_&index=2) to implement the full project from scratch:**
[](https://www.youtube.com/watch?v=x8dfQkKTyP0&list=PLAx-M6Di0SisFJ1rv5M_FRHUlGA5rtUf_&index=2)
π *Link to Video:* [Watch on Youtube](https://www.youtube.com/watch?v=x8dfQkKTyP0&list=PLAx-M6Di0SisFJ1rv5M_FRHUlGA5rtUf_&index=2)
## π Project Overview
The goal is to simulate how actual data analysts in the e-commerce or retail industries work behind the scenes to use SQL to:
β
Set up a messy, real-world e-commerce inventory **database**
β
Perform **Exploratory Data Analysis (EDA)** to explore product categories, availability, and pricing inconsistencies
β
Implement **Data Cleaning** to handle null values, remove invalid entries, and convert pricing from paise to rupees
β
Write **business-driven SQL queries** to derive insights around **pricing, inventory, stock availability, revenue** and more
## π Dataset Overview
The dataset was sourced from [Kaggle](https://www.kaggle.com/datasets/palvinder2006/zepto-inventory-dataset/data?select=zepto_v2.csv) and was originally scraped from Zeptoβs official product listings. It mimics what youβd typically encounter in a real-world e-commerce inventory system.
Each row represents a unique SKU (Stock Keeping Unit) for a product. Duplicate product names exist because the same product may appear multiple times in different package sizes, weights, discounts, or categories to improve visibility β exactly how real catalog data looks.
π§Ύ Columns:
- **sku_id:** Unique identifier for each product entry (Synthetic Primary Key)
- **name:** Product name as it appears on the app
- **category:** Product category like Fruits, Snacks, Beverages, etc.
- **mrp:** Maximum Retail Price (originally in paise, converted to βΉ)
- **discountPercent:** Discount applied on MRP
- **discountedSellingPrice:** Final price after discount (also converted to βΉ)
- **availableQuantity:** Units available in inventory
- **weightInGms:** Product weight in grams
- **outOfStock:** Boolean flag indicating stock availability
- **quantity:** Number of units per package (mixed with grams for loose produce)
## π§ Project Workflow
Hereβs a step-by-step breakdown of what we do in this project:
### 1. Database & Table Creation
We start by creating a SQL table with appropriate data types:
```sql
CREATE TABLE zepto (
sku_id SERIAL PRIMARY KEY,
category VARCHAR(120),
name VARCHAR(150) NOT NULL,
mrp NUMERIC(8,2),
discountPercent NUMERIC(5,2),
availableQuantity INTEGER,
discountedSellingPrice NUMERIC(8,2),
weightInGms INTEGER,
outOfStock BOOLEAN,
quantity INTEGER
);
```
### 2. Data Import
- Loaded CSV using pgAdmin's import feature.
- If you're not able to use the import feature, write this code instead:
```sql
\copy zepto(category,name,mrp,discountPercent,availableQuantity,
discountedSellingPrice,weightInGms,outOfStock,quantity)
FROM 'data/zepto_v2.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ENCODING 'UTF8');
```
- Faced encoding issues (UTF-8 error), which were fixed by saving the CSV file using CSV UTF-8 format.
### 3. π Data Exploration
- Counted the total number of records in the dataset
- Viewed a sample of the dataset to understand structure and content
- Checked for null values across all columns
- Identified distinct product categories available in the dataset
- Compared in-stock vs out-of-stock product counts
- Detected products present multiple times, representing different SKUs
### 4. π§Ή Data Cleaning
- Identified and removed rows where MRP or discounted selling price was zero
- Converted mrp and discountedSellingPrice from paise to rupees for consistency and readability
### 5. π Business Insights
- Found top 10 best-value products based on discount percentage
- Identified high-MRP products that are currently out of stock
- Estimated potential revenue for each product category
- Filtered expensive products (MRP > βΉ500) with minimal discount
- Ranked top 5 categories offering highest average discounts
- Calculated price per gram to identify value-for-money products
- Grouped products based on weight into Low, Medium, and Bulk categories
- Measured total inventory weight per product category
## π οΈ How to Use This Project
1. **Clone the repository**
```bash
git clone https://github.com/amlanmohanty/zepto-SQL-data-analysis-project.git
cd zepto-SQL-data-analysis-project
```
2. **Open zepto_SQL_data_analysis.sql**
This file contains:
- Table creation
- Data exploration
- Data cleaning
- SQL Business analysis
3. **Load the dataset into pgAdmin or any other PostgreSQL client**
- Create a database and run the SQL file
- Import the dataset (convert to UTF-8 if necessary)
4. **Follow along with the YouTube video for full walkthrough. π¨βπΌ**
## π License
MIT β feel free to fork, star, and use in your portfolio.
## π¨βπ» About the Author
Hey, Iβm Amlan Mohanty β a Data Analyst & Content Creator.
I break down complex data topics into simple, practical content that actually helps you land a job.
### π Stay Connected & Join the Data Drool Community
If you enjoyed this project and want to keep learning and growing as a data analyst, letβs stay in touch! I regularly share content around SQL, data analytics, portfolio projects, job tips, and more.
π₯ YouTube: [Amlan Mohanty](https://www.youtube.com/@amlanmohanty1)
- Beginner-friendly tutorials, real-world projects, job and career advice
πΊ Instagram: [data.drool](https://www.instagram.com/data.drool/)
- Quick SQL tips, data memes, and behind-the-scenes content
πΌ LinkedIn: [Amlan Mohanty](https://www.linkedin.com/in/amlanmohanty1/)
- Letβs connect professionally and grow your data career
## π‘ Thanks for checking out the project! Your support means a lot β feel free to star β this repo or share it with someone learning SQL.π