https://github.com/billy-enrizky/sales-analysis

"Sales Data Analysis Project: Analyzing sales data, cleaning, and exploring insights. Python and Pandas used for data analysis."
https://github.com/billy-enrizky/sales-analysis

dataanalysis exploratory-data-analysis jupyter-notebook pandas python

Last synced: 14 days ago
JSON representation

"Sales Data Analysis Project: Analyzing sales data, cleaning, and exploring insights. Python and Pandas used for data analysis."

Host: GitHub
URL: https://github.com/billy-enrizky/sales-analysis
Owner: billy-enrizky
License: mit
Created: 2023-10-26T19:16:22.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-11-03T21:27:46.000Z (almost 2 years ago)
Last Synced: 2025-02-25T23:18:56.887Z (8 months ago)
Topics: dataanalysis, exploratory-data-analysis, jupyter-notebook, pandas, python
Language: Jupyter Notebook
Homepage:
Size: 5.52 MB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Sales Data Analysis Project

This project involves the analysis of sales data to gain insights into various aspects of the sales operation. The dataset used for this analysis includes information about sales orders, such as order details, products, quantities, prices, and more.

## Table of Contents

1. [Project Overview](#project-overview)

2. [Getting Started](#getting-started)

3. [Data Cleaning](#data-cleaning)

4. [Data Exploration](#data-exploration)

    - [Question 1: What was the best month for sales? How much was earned that month?](#question-1)

    - [Question 2: What city sold the most products?](#question-2)

    - [Question 3: What time should we display advertisements to maximize the likelihood of customers buying products?](#question-3)

    - [Question 4: What products are most often sold together?](#question-4)

    - [What product sold the most? Why do you think it sold the most?](#most-sold-product)



## Project Overview

The goal of this project is to analyze sales data to gain insights into various aspects of the sales operation. This includes cleaning the data, performing data exploration, and answering specific questions related to sales performance.



## Getting Started

### Import necessary libraries

```python

import os

import pandas as pd

```

### Merge data from each month into one CSV

```python

path = "./Sales_Data"

files = [file for file in os.listdir(path) if not file.startswith('.')]  # Ignore hidden files

all_months_data = pd.DataFrame()

for file in files:

    current_data = pd.read_csv(path + "/" + file)

    all_months_data = pd.concat([all_months_data, current_data])

    

all_months_data.to_csv("all_data_copy.csv", index=False)

```

### Read in the updated dataframe

```python

all_data = pd.read_csv("all_data.csv")

```



## Data Cleaning

### Drop rows of NaN

```python

nan_df = all_data[all_data.isna().any(axis=1)]

all_data = all_data.dropna(how='all')

```

### Get rid of text in the 'Order Date' column

```python

all_data = all_data[all_data['Order Date'].str[0:2] != 'Or']

```

### Make columns the correct type

```python

all_data['Quantity Ordered'] = pd.to_numeric(all_data['Quantity Ordered'])

all_data['Price Each'] = pd.to_numeric(all_data['Price Each'])

```

### Augment data with additional columns

#### Add month column

```python

all_data['Month'] = all_data['Order Date'].str[0:2]

all_data['Month'] = all_data['Month'].astype('int32')

```

#### Add month column (alternative method)

```python

all_data['Month 2'] = pd.to_datetime(all_data['Order Date']).dt.month

```

#### Add city column

```python

def get_city(address):

    return address.split(",")[1].strip(" ")

def get_state(address):

    return address.split(",")[2].split(" ")[1]

all_data['City'] = all_data['Purchase Address'].apply(lambda x: f"{get_city(x)}  ({get_state(x)})")

```



## Data Exploration



### Question 1: What was the best month for sales? How much was earned that month?

```python

all_data['Sales'] = all_data['Quantity Ordered'].astype('int') * all_data['Price Each'].astype('float')

sales_by_month = all_data.groupby(['Month']).sum()

import matplotlib.pyplot as plt

months = range(1, 13)

plt.bar(months, sales_by_month['Sales'])

plt.xticks(months)

plt.ylabel('Sales in USD ($)')

plt.xlabel('Month number')

plt.show()

```



### Question 2: What city sold the most product?

```python

city_sales = all_data.groupby(['City']).sum()

keys = [city for city, df in all_data.groupby(['City'])]

plt.bar(keys, city_sales['Sales'])

plt.ylabel('Sales in USD ($)')

plt.xlabel('City')

plt.xticks(keys, rotation='vertical', size=8)

plt.show()

```



### Question 3: What time should we display advertisements to maximize the likelihood of customers buying a product?

```python

all_data['Hour'] = pd.to_datetime(all_data['Order Date']).dt.hour

keys = [pair for pair, df in all_data.groupby(['Hour'])]

plt.plot(keys, all_data.groupby(['Hour']).count()['Count'])

plt.xticks(keys)

plt.grid()

plt.show()

```



### Question 4: What products are most often sold together?

```python

# Find products that are often sold together

df = all_data[all_data['Order ID'].duplicated(keep=False)]

df['Grouped'] = df.groupby('Order ID')['Product'].transform(lambda x: ','.join(x))

df2 = df[['Order ID', 'Grouped']].drop duplicates()

# Count combinations

from itertools import combinations

from collections import Counter

count = Counter()

for row in df2['Grouped']:

    row_list = row.split(',')

    count.update(Counter(combinations(row_list, 2))

# Display the most common product combinations

for key, value in count.most_common(10):

    print(key, value)

```



### What product sold the most? Why do you think it sold the most?

```python

product_group = all_data.groupby('Product')

quantity_ordered = product_group.sum()['Quantity Ordered']

keys = [pair for pair, df in product_group]

plt.bar(keys, quantity_ordered)

plt.xticks(keys, rotation='vertical', size=8)

prices = all_data.groupby('Product').mean()['Price Each']

fig, ax1 = plt.subplots()

ax2 = ax1.twinx()

ax1.bar(keys, quantity_ordered, color='g')

ax2.plot(keys, prices, color='b')

ax1.set_xlabel('Product Name')

ax1.set_ylabel('Quantity Ordered', color='g')

ax2.set_ylabel('Price ($)', color='b')

plt.show()

```

This README provides an overview of the Sales Data Analysis project, including the code for data cleaning, data exploration, and answers to specific questions related to the sales data. The project aims to provide insights into sales performance, best-selling products, and sales trends over time.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/billy-enrizky/sales-analysis

Awesome Lists containing this project

README