https://github.com/rohinimaity/sales_data_analysis
https://github.com/rohinimaity/sales_data_analysis
data-analysis postgresql sql
Last synced: 12 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/rohinimaity/sales_data_analysis
- Owner: RohiniMaity
- Created: 2025-05-04T23:02:54.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-05T00:52:58.000Z (about 1 year ago)
- Last Synced: 2025-05-07T13:51:00.307Z (about 1 year ago)
- Topics: data-analysis, postgresql, sql
- Language: SQL
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📊 Walmart Sales Data Analysis
This project performs data cleaning in **Python** and then conducts a comprehensive **SQL analysis** using **PostgreSQL**. It uncovers insights into sales performance, customer behavior, and store-level metrics from Walmart’s transactional data.
---
## 📁 Project Structure
- `project.py`: Python file with:
- Data loading and cleaning in **Pandas**
- Data export to PostgreSQL (optional)
- Analytical SQL queries using `psycopg2` or `sqlalchemy`
- `Walmart.csv`: Raw transactional dataset (must be present in the same directory)
- `README.md`: Documentation and explanation
---
## Get Dataset
### Setup Kaggle API
- API Setup: Obtain your Kaggle API token from Kaggle by navigating to your profile settings and downloading the JSON file.
- Configure Kaggle:
- Place the downloaded kaggle.json file in your local .kaggle folder.
- Use the command kaggle datasets download -d to pull datasets directly into your project.
### Download Walmart Sales Data
- Data Source: Use the Kaggle API to download the Walmart sales datasets from Kaggle.
- Dataset Link: [Walmart Sales Dataset](https://www.kaggle.com/datasets/najir0123/walmart-10k-sales-datasets)
- Storage: Save the data in the data/ folder for easy reference and access.
## 🧼 Data Cleaning Steps (Python)
Using **Pandas**, the dataset is prepared by:
- Loading `Walmart.csv` with `pd.read_csv`
- Checking for duplicates and nulls
- Removing duplicates and missing entries
- Performing initial data profiling with `describe()`, `info()`
Example:
```python
df = pd.read_csv("Walmart.csv")
df.drop_duplicates(inplace=True)
df.dropna(inplace=True)
```
---
## 🛠️ Database Setup (PostgreSQL)
1. Create a database (e.g., `walmart_db`) in PostgreSQL.
2. Import the cleaned data into a table called `walmart`.
3. If using Python to connect:
```python
from sqlalchemy import create_engine
import os
from dotenv import load_dotenv
load_dotenv()
engine = create_engine(
f"postgresql+psycopg2://{os.getenv('POSTGRES_USER')}:{os.getenv('POSTGRES_PASSWORD')}@localhost:{os.getenv('POSTGRES_PORT')}/{os.getenv('POSTGRES_DB')}"
)
```
---
## 🔍 Key Analysis Questions (SQL)
1. **Payment Methods**
➤ How many transactions and items were sold via each method?
2. **Top-Rated Categories**
➤ Which category had the highest average rating per branch?
3. **Busiest Day per Branch**
➤ What’s the most active day of the week for each branch?
4. **Quantity Sold by Payment Type**
➤ How many items were sold using each payment method?
5. **Category Ratings by City**
➤ What are the average, min, and max ratings for each category by city?
6. **Category Profitability**
➤ Which categories bring in the most profit?
7. **Popular Payment Methods by Branch**
➤ What’s the most frequently used payment method at each branch?
8. **Sales by Time of Day**
➤ How does transaction volume vary across shifts (Morning, Afternoon, Evening)?
9. **Revenue Decline YoY**
➤ Which branches had the steepest revenue decline year-over-year?
---
## 📌 Highlights
- Used `RANK() OVER (...)` and `GROUP BY` extensively for aggregations and rankings.
- Converted date strings with `TO_DATE` and extracted week days using `TO_CHAR`.
- Time of day classification using `EXTRACT(HOUR FROM time)`.
---
## 📚 Dependencies
- Python 3.x
- pandas
- sqlalchemy
- psycopg2
- dotenv
- PostgreSQL 12+
Install with:
```bash
pip install pandas sqlalchemy psycopg2-binary python-dotenv
```
---
## 🧠 Learnings
- Handling PostgreSQL’s case sensitivity (e.g., using `"Branch"` vs `branch`)
- Optimizing SQL queries with window functions
- Using Python for efficient pre-processing before SQL analysis
---
## 📄 License
This project is intended for educational and analytical purposes. Feel free to fork, reuse, and expand upon it.