https://github.com/shruti-h/sales_data_analysis
Sales Data Analysis | Pandas & Matplotlib
https://github.com/shruti-h/sales_data_analysis
data-analysis data-science data-vi matplotlib pandas-library python
Last synced: about 2 months ago
JSON representation
Sales Data Analysis | Pandas & Matplotlib
- Host: GitHub
- URL: https://github.com/shruti-h/sales_data_analysis
- Owner: Shruti-H
- Created: 2025-02-28T12:59:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-28T13:23:00.000Z (over 1 year ago)
- Last Synced: 2025-06-29T22:38:43.308Z (12 months ago)
- Topics: data-analysis, data-science, data-vi, matplotlib, pandas-library, python
- Language: Jupyter Notebook
- Homepage:
- Size: 377 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Sales Data Analysis - Mini Project
## 📌 Overview
This repository contains a **Sales Data Analysis** project that explores and answers business-related questions using **Python, Pandas, and Matplotlib**. The dataset consists of **12 months of sales data** from an electronics store, containing information on order ID, products, quantity ordered, price, order data and purchase address.
The project includes data cleaning, exploratory data analysis (EDA), and visualization of insights to make data-driven business decisions.
---
## 📊 Data Cleaning & Preparation
Before diving into analysis, data cleaning was performed to ensure accuracy and consistency. Tasks included:
✔ **Dropping NaN values** from the DataFrame.
✔ **Converting data types**
✔ **Extracting useful columns** (e.g., hour from timestamp, city from address).
---
## 🔍 Business Questions Answered
Using **Pandas & Matplotlib**, the following **key business questions** were explored:
1️⃣ **What was the best month for sales?** 🏆 How much revenue was generated that month?
2️⃣ **Which city had the highest sales?** 📍 Understanding regional demand.
3️⃣ **What time should advertisements be displayed?** ⏰ Maximizing customer purchase likelihood.
4️⃣ **Which products are most often sold together?** 🔗 Product bundling insights.
5️⃣ **What product sold the most?** 📦 Why might it have been the top seller?
Each question was answered using **data aggregation, groupby operations, and visualizations**
---
## 🔧 Methods & Techniques Used
Throughout this analysis, the following **Pandas & Matplotlib techniques** were leveraged:
✔ **Merging & Concatenating multiple CSV files** to create a unified dataset (`pd.concat`).
✔ **Adding new calculated columns** (e.g., sales, hour,city etc).
✔ **String parsing operations** (`.str.split()`, `.apply()` functions).
✔ **Using `groupby` for aggregate analysis**.
✔ **Visualizing insights** using vertival and horizontal bar charts and line graphs
✔ **Labeling and formatting graphs** for better readability.
---
## 🏆 Credits
This project was inspired by **real-world business problems** and implemented using Python's powerful data analysis tools.