Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cagandemirmr/flo_data_exploration

You can find data exploration of FLO
https://github.com/cagandemirmr/flo_data_exploration

dataanalysis dataexploration queries sql sql-server

Last synced: 20 days ago
JSON representation

You can find data exploration of FLO

Awesome Lists containing this project

README

        

# FLO Customer Data Exploration

![image](https://github.com/user-attachments/assets/f1f05852-2803-4aa4-bc22-934a68e191ca)

This project involves the exploration and analysis of customer data from **FLO**, a leading producer and retailer in the Turkish shoe industry. The dataset contains information on customer purchasing behavior across various channels, including both online and offline transactions.

## Dataset Information

- **Total Rows**: 20,000
- **Total Columns**: 13

### Column Descriptions

1. **Master_ID**:
- A unique 36-character identifier composed of numbers and letters that anonymously represents each customer.

2. **Order_Channel**:
- The channel used by the customer when placing an order. It consists of the following 4 variables:
- iOS App
- Android App
- Desktop
- Mobile

3. **Last_order_channel**:
- The channel used by the customer for their most recent order. It includes 5 variables:
- iOS App
- Android App
- Desktop
- Mobile
- Offline (In-store purchases)

4. **First_order_date**:
- The date when the customer placed their first order. The minimum date is **2013-01-14** and the maximum date is **2021-05-27**.

5. **Last_order_date**:
- The date when the customer placed their most recent order. The minimum date is **2020-05-30** and the maximum date is **2021-05-30**.

6. **Last_order_date_online**:
- The date when the customer made their most recent online order.

7. **Last_order_date_offline**:
- The date when the customer made their most recent offline (in-store) order.

8. **Order_num_total_ever_offline**:
- The total number of orders placed by the customer through offline channels.

9. **Order_num_total_ever_online**:
- The total number of orders placed by the customer through online channels.

10. **Customer_value_total_ever_offline**:
- The total revenue generated by the customer from offline purchases.

11. **Customer_value_total_ever_online**:
- The total revenue generated by the customer from online purchases.

12. **Interested_in_categories_12**:
- This column contains 32 unique values representing categories assigned based on the customer's interests.

13. **Store_type**:
- Consists of 3 main store types, with 4 combinations based on customer purchasing habits.

## Steps

### Data Preparation

Before importing the data into SQL, I made several adjustments in Excel:
- Formatted date columns to ensure consistency.
- Removed unnecessary characters and cleaned up the data.

After these modifications, I copied the Excel tables into SQL. This approach is efficient for:
- Familiarizing myself with the data.
- Data normalization, which reduces the size of the data.

![image](https://github.com/user-attachments/assets/78473aa7-d728-414b-9fb6-5bf3ec334217)

However, while this method improves efficiency for checking and exploring data, it's less efficient in terms of large-scale data usage. Fortunately, the dataset's size can be classified as moderate.

### SQL Queries

To gain a basic understanding of the data, I wrote the following queries in SQL:

```sql
-- General overview of the dataset
SELECT * FROM CUSTOMERS;

-- To observe the minimum and maximum character lengths of Master_ID
SELECT DISTINCT MIN(LEN(Master_ID)) AS MINIMUM_LENGTH,
MAX(LEN(Master_ID)) AS MAXIMUM_LENGTH
FROM CUSTOMERS
GROUP BY Master_ID;

-- Distinct values of the Order Channel
SELECT DISTINCT Order_channel FROM CUSTOMERS;

-- Distinct values of the Last Order Channel
SELECT DISTINCT Last_order_channel FROM CUSTOMERS;

-- Minimum and maximum Last Order Dates
SELECT MIN(Last_order_date) AS MIN_LAST_DATE,
MAX(Last_order_date) AS MAX_LAST_DATE
FROM CUSTOMERS;

-- Minimum and maximum First Order Dates
SELECT MIN(First_order_date) AS MIN_FIRST_DATE,
MAX(First_order_date) AS MAX_FIRST_DATE
FROM CUSTOMERS;

-- Minimum and maximum Last Online Order Dates
SELECT MIN(Last_order_date_online) AS MIN_LAST_OR_ONLINE_DATE,
MAX(Last_order_date_online) AS MAX_LAST_OR_ONLINE_DATE
FROM CUSTOMERS;

-- Minimum and maximum Last Offline Order Dates
SELECT MIN(Last_order_date_offline) AS MIN_LAST_OR_OFFLINE_DATE,
MAX(Last_order_date_offline) AS MAX_LAST_OR_OFFLINE_DATE
FROM CUSTOMERS;

-- Minimum and maximum order amounts (offline)
SELECT MIN(Order_num_total_ever_offline),
MAX(Order_num_total_ever_offline)
FROM CUSTOMERS;

-- Minimum and maximum order amounts (online)
SELECT MIN(Order_num_total_ever_online),
MAX(Order_num_total_ever_online)
FROM CUSTOMERS;

-- Minimum and maximum customer values (offline)
SELECT MIN(Customer_value_total_ever_offline),
MAX(Customer_value_total_ever_offline)
FROM CUSTOMERS;

-- Minimum and maximum customer values (online)
SELECT MIN(Customer_value_total_ever_online),
MAX(Customer_value_total_ever_online)
FROM CUSTOMERS;

-- Distinct values of Interested Categories
SELECT DISTINCT Interested_in_categories_12 FROM CUSTOMERS;

-- Distinct values of Store Type
SELECT DISTINCT Store_type FROM CUSTOMERS;

```
These queries provided a foundational understanding of the data and helped identify key trends and outliers in customer behavior, order channels, and purchasing patterns

### Analysis Questions

In the end of variability analysis,some questions appeared in my mind and write querries in SQL.

1. **What is the unique customer number who made a purchase?**

```sql
select count(distinct Master_ID) TOTALCUSTOMERS from CUSTOMERS

```

![image](https://github.com/user-attachments/assets/320ea439-169b-4982-a425-93cf7da85d95)

2. **What is the total number of purchases and the total revenue?**

```sql
select sum(Order_num_total_ever_offline+Order_num_total_ever_online) TOTAL_AMOUNT_OF_SALES,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) REVENUE
from CUSTOMERS
```

![image](https://github.com/user-attachments/assets/220819c8-e03d-4bc2-b6f6-7cfa9e9ab1a1)

3. **What is the average revenue per purchase?**

```sql
select (sum(Customer_value_total_ever_offline+ Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online) )AVG_REVENUE
from CUSTOMERS
```

![image](https://github.com/user-attachments/assets/c52d6bb1-06a3-43a2-8b72-024807e9569a)

5. **What is the total revenue and number of purchases made through the last order channel (last_order_channel)?**

```sql
select Last_order_channel, (sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online)) AVG_REVENUE
from CUSTOMERS group by Last_order_channel
```

![image](https://github.com/user-attachments/assets/10d53509-e590-4d91-8aef-0db219e96ad9)

7. **What is the total revenue by store type?**

```sql
select Store_type,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE
from CUSTOMERS group by Store_type
```

![image](https://github.com/user-attachments/assets/de61cff0-639d-4c99-b036-9dc6e70d295f)

9. **What is the number of purchases per year based on the customer's first order date (first_order_date)?**

```sql
select year(First_order_date) YEAR_,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_AMOUNT_OF_SALES
from CUSTOMERS group by year(First_order_date) order by 2 desc
```

![image](https://github.com/user-attachments/assets/7f60a01c-06f5-407c-ba27-d2baf4ccfef3)

11. **What is the average revenue per purchase, by the last order channel (last_order_channel)?**

```sql
select Last_order_channel, sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/sum(Order_num_total_ever_offline+Order_num_total_ever_online) PRODUCTIVITY
from CUSTOMERS
group by Last_order_channel
```

![image](https://github.com/user-attachments/assets/0e7b8b3f-f20c-4b3a-9360-e31bcf973b7e)

12. **What is the most popular category in the last 12 months?**

```sql
select Interested_in_categories_12,count(*) Frequency
from CUSTOMERS
group by Interested_in_categories_12
order by 2 desc
```

![image](https://github.com/user-attachments/assets/84103679-070a-442d-adda-9257b885f9d8)

14. **What is the most preferred store type?**

```sql
select top 1 Store_type,count(*) Store_count
from CUSTOMERS group by Store_type
order by 2 desc
```

![image](https://github.com/user-attachments/assets/05c803f1-3877-4f02-8460-5938f92d50bd)

16. **What is the most popular category and the total amount spent in that category, broken down by the last order channel (last_order_channel)?**

```sql
SELECT DISTINCT
C.Last_order_channel,
(SELECT TOP 1 Interested_in_categories_12
FROM CUSTOMERS
WHERE Last_order_channel = C.Last_order_channel
GROUP BY Interested_in_categories_12
ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Top_category,
(SELECT TOP 1 SUM(Order_num_total_ever_offline + Order_num_total_ever_online)
FROM CUSTOMERS
WHERE Last_order_channel = C.Last_order_channel
GROUP BY Interested_in_categories_12
ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Total_order
FROM CUSTOMERS C
```

![image](https://github.com/user-attachments/assets/769d8dc0-8592-4e69-838c-0256aec30613)

18. **What is the ID of the customer who made the most purchases?**

```sql
select top 1 Master_ID,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_SALES
from CUSTOMERS group by Master_ID
order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) desc
```

![image](https://github.com/user-attachments/assets/40c6edf7-e679-4e89-8e75-9c3063b31afd)

20. **What is the average revenue per purchase and the average number of days between purchases (purchase frequency) for the customer who made the most purchases?**

```sql
select D.Master_ID,D.TOTAL_REVENUE/D.FREQUENCY AVERAGE_REVENUE_BY_FREQ, D.TOTAL_REVENUE / round(DATEDIFF(day,First_order_date,Last_order_date) ,1) REVENUE_BY_DAY from (select top 1 Master_ID,First_order_date,Last_order_date,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE,
sum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY from CUSTOMERS group by Master_ID,First_order_date,Last_order_date
order by TOTAL_REVENUE desc) D
```

![image](https://github.com/user-attachments/assets/103150e4-6681-4dd6-b39a-a274dde72530)

22. **What is the average number of days between purchases (purchase frequency) for the top 100 customers by total revenue?**

```sql
select D.Master_ID,D.First_order_date,D.Last_order_date,D.REVENUE/D.FREQUENCY AVG_REVENUE,DATEDIFF(day,D.First_order_date,D.Last_order_date) DATE_DIF,round(D.REVENUE/ DATEDIFF(day,D.First_order_date,D.Last_order_date),1) DATE_BY_REVENUE
from (select top 100 Master_ID,First_order_date,Last_order_date,
sum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)
REVENUE from CUSTOMERS group by Master_ID,First_order_date,Last_order_date order by REVENUE desc) D
```

![image](https://github.com/user-attachments/assets/e23f31be-65d3-461c-86e2-514e0d300bcf)

24. **Who is the customer who made the most purchases, broken down by the last order channel (last_order_channel)?**

```sql
select distinct Last_order_channel,
(select top 1 Master_ID from CUSTOMERS where Last_order_channel=C.Last_order_channel
group by Master_ID
order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) desc) CUSTOMER, (select top 1 sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline )
from CUSTOMERS where Last_order_channel=C.Last_order_channel group by Master_ID order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline ) desc) REVENUE from CUSTOMERS C
```

![image](https://github.com/user-attachments/assets/b63dd722-5baa-42f8-9457-5e45d1f82a6c)

25. **What is the ID of the last customer who made a purchase (including multiple IDs if there were multiple purchases on the latest date)?**

```sql
select Master_ID,Last_order_date from CUSTOMERS
group by Master_ID,Last_order_date
having Last_order_date = (select max(Last_order_date) from CUSTOMERS)
```

![image](https://github.com/user-attachments/assets/296de4be-e635-406c-9372-592501dd3279)