Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cagandemirmr/flo_data_exploration
You can find data exploration of FLO
https://github.com/cagandemirmr/flo_data_exploration
dataanalysis dataexploration queries sql sql-server
Last synced: 20 days ago
JSON representation
You can find data exploration of FLO
- Host: GitHub
- URL: https://github.com/cagandemirmr/flo_data_exploration
- Owner: cagandemirmr
- Created: 2024-10-15T18:08:52.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-10-15T19:32:41.000Z (2 months ago)
- Last Synced: 2024-10-17T07:32:59.449Z (2 months ago)
- Topics: dataanalysis, dataexploration, queries, sql, sql-server
- Homepage:
- Size: 25.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# FLO Customer Data Exploration
![image](https://github.com/user-attachments/assets/f1f05852-2803-4aa4-bc22-934a68e191ca)
This project involves the exploration and analysis of customer data from **FLO**, a leading producer and retailer in the Turkish shoe industry. The dataset contains information on customer purchasing behavior across various channels, including both online and offline transactions.
## Dataset Information
- **Total Rows**: 20,000
- **Total Columns**: 13### Column Descriptions
1. **Master_ID**:
- A unique 36-character identifier composed of numbers and letters that anonymously represents each customer.2. **Order_Channel**:
- The channel used by the customer when placing an order. It consists of the following 4 variables:
- iOS App
- Android App
- Desktop
- Mobile3. **Last_order_channel**:
- The channel used by the customer for their most recent order. It includes 5 variables:
- iOS App
- Android App
- Desktop
- Mobile
- Offline (In-store purchases)4. **First_order_date**:
- The date when the customer placed their first order. The minimum date is **2013-01-14** and the maximum date is **2021-05-27**.5. **Last_order_date**:
- The date when the customer placed their most recent order. The minimum date is **2020-05-30** and the maximum date is **2021-05-30**.6. **Last_order_date_online**:
- The date when the customer made their most recent online order.7. **Last_order_date_offline**:
- The date when the customer made their most recent offline (in-store) order.8. **Order_num_total_ever_offline**:
- The total number of orders placed by the customer through offline channels.9. **Order_num_total_ever_online**:
- The total number of orders placed by the customer through online channels.10. **Customer_value_total_ever_offline**:
- The total revenue generated by the customer from offline purchases.11. **Customer_value_total_ever_online**:
- The total revenue generated by the customer from online purchases.12. **Interested_in_categories_12**:
- This column contains 32 unique values representing categories assigned based on the customer's interests.13. **Store_type**:
- Consists of 3 main store types, with 4 combinations based on customer purchasing habits.## Steps
### Data Preparation
Before importing the data into SQL, I made several adjustments in Excel:
- Formatted date columns to ensure consistency.
- Removed unnecessary characters and cleaned up the data.After these modifications, I copied the Excel tables into SQL. This approach is efficient for:
- Familiarizing myself with the data.
- Data normalization, which reduces the size of the data.
![image](https://github.com/user-attachments/assets/78473aa7-d728-414b-9fb6-5bf3ec334217)However, while this method improves efficiency for checking and exploring data, it's less efficient in terms of large-scale data usage. Fortunately, the dataset's size can be classified as moderate.
### SQL Queries
To gain a basic understanding of the data, I wrote the following queries in SQL:
```sql
-- General overview of the dataset
SELECT * FROM CUSTOMERS;-- To observe the minimum and maximum character lengths of Master_ID
SELECT DISTINCT MIN(LEN(Master_ID)) AS MINIMUM_LENGTH,
MAX(LEN(Master_ID)) AS MAXIMUM_LENGTH
FROM CUSTOMERS
GROUP BY Master_ID;-- Distinct values of the Order Channel
SELECT DISTINCT Order_channel FROM CUSTOMERS;-- Distinct values of the Last Order Channel
SELECT DISTINCT Last_order_channel FROM CUSTOMERS;-- Minimum and maximum Last Order Dates
SELECT MIN(Last_order_date) AS MIN_LAST_DATE,
MAX(Last_order_date) AS MAX_LAST_DATE
FROM CUSTOMERS;-- Minimum and maximum First Order Dates
SELECT MIN(First_order_date) AS MIN_FIRST_DATE,
MAX(First_order_date) AS MAX_FIRST_DATE
FROM CUSTOMERS;-- Minimum and maximum Last Online Order Dates
SELECT MIN(Last_order_date_online) AS MIN_LAST_OR_ONLINE_DATE,
MAX(Last_order_date_online) AS MAX_LAST_OR_ONLINE_DATE
FROM CUSTOMERS;-- Minimum and maximum Last Offline Order Dates
SELECT MIN(Last_order_date_offline) AS MIN_LAST_OR_OFFLINE_DATE,
MAX(Last_order_date_offline) AS MAX_LAST_OR_OFFLINE_DATE
FROM CUSTOMERS;-- Minimum and maximum order amounts (offline)
SELECT MIN(Order_num_total_ever_offline),
MAX(Order_num_total_ever_offline)
FROM CUSTOMERS;-- Minimum and maximum order amounts (online)
SELECT MIN(Order_num_total_ever_online),
MAX(Order_num_total_ever_online)
FROM CUSTOMERS;-- Minimum and maximum customer values (offline)
SELECT MIN(Customer_value_total_ever_offline),
MAX(Customer_value_total_ever_offline)
FROM CUSTOMERS;-- Minimum and maximum customer values (online)
SELECT MIN(Customer_value_total_ever_online),
MAX(Customer_value_total_ever_online)
FROM CUSTOMERS;-- Distinct values of Interested Categories
SELECT DISTINCT Interested_in_categories_12 FROM CUSTOMERS;-- Distinct values of Store Type
SELECT DISTINCT Store_type FROM CUSTOMERS;```
These queries provided a foundational understanding of the data and helped identify key trends and outliers in customer behavior, order channels, and purchasing patterns### Analysis Questions
In the end of variability analysis,some questions appeared in my mind and write querries in SQL.
1. **What is the unique customer number who made a purchase?**
```sql
select count(distinct Master_ID) TOTALCUSTOMERS from CUSTOMERS```
![image](https://github.com/user-attachments/assets/320ea439-169b-4982-a425-93cf7da85d95)
2. **What is the total number of purchases and the total revenue?**
```sql
select sum(Order_num_total_ever_offline+Order_num_total_ever_online) TOTAL_AMOUNT_OF_SALES,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) REVENUE
from CUSTOMERS
```![image](https://github.com/user-attachments/assets/220819c8-e03d-4bc2-b6f6-7cfa9e9ab1a1)
3. **What is the average revenue per purchase?**
```sql
select (sum(Customer_value_total_ever_offline+ Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online) )AVG_REVENUE
from CUSTOMERS
```![image](https://github.com/user-attachments/assets/c52d6bb1-06a3-43a2-8b72-024807e9569a)
5. **What is the total revenue and number of purchases made through the last order channel (last_order_channel)?**
```sql
select Last_order_channel, (sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online)) AVG_REVENUE
from CUSTOMERS group by Last_order_channel
```![image](https://github.com/user-attachments/assets/10d53509-e590-4d91-8aef-0db219e96ad9)
7. **What is the total revenue by store type?**
```sql
select Store_type,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE
from CUSTOMERS group by Store_type
```![image](https://github.com/user-attachments/assets/de61cff0-639d-4c99-b036-9dc6e70d295f)
9. **What is the number of purchases per year based on the customer's first order date (first_order_date)?**
```sql
select year(First_order_date) YEAR_,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_AMOUNT_OF_SALES
from CUSTOMERS group by year(First_order_date) order by 2 desc
```![image](https://github.com/user-attachments/assets/7f60a01c-06f5-407c-ba27-d2baf4ccfef3)
11. **What is the average revenue per purchase, by the last order channel (last_order_channel)?**
```sql
select Last_order_channel, sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/sum(Order_num_total_ever_offline+Order_num_total_ever_online) PRODUCTIVITY
from CUSTOMERS
group by Last_order_channel
```![image](https://github.com/user-attachments/assets/0e7b8b3f-f20c-4b3a-9360-e31bcf973b7e)
12. **What is the most popular category in the last 12 months?**
```sql
select Interested_in_categories_12,count(*) Frequency
from CUSTOMERS
group by Interested_in_categories_12
order by 2 desc
```![image](https://github.com/user-attachments/assets/84103679-070a-442d-adda-9257b885f9d8)
14. **What is the most preferred store type?**
```sql
select top 1 Store_type,count(*) Store_count
from CUSTOMERS group by Store_type
order by 2 desc
```![image](https://github.com/user-attachments/assets/05c803f1-3877-4f02-8460-5938f92d50bd)
16. **What is the most popular category and the total amount spent in that category, broken down by the last order channel (last_order_channel)?**
```sql
SELECT DISTINCT
C.Last_order_channel,
(SELECT TOP 1 Interested_in_categories_12
FROM CUSTOMERS
WHERE Last_order_channel = C.Last_order_channel
GROUP BY Interested_in_categories_12
ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Top_category,
(SELECT TOP 1 SUM(Order_num_total_ever_offline + Order_num_total_ever_online)
FROM CUSTOMERS
WHERE Last_order_channel = C.Last_order_channel
GROUP BY Interested_in_categories_12
ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Total_order
FROM CUSTOMERS C
```![image](https://github.com/user-attachments/assets/769d8dc0-8592-4e69-838c-0256aec30613)
18. **What is the ID of the customer who made the most purchases?**
```sql
select top 1 Master_ID,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_SALES
from CUSTOMERS group by Master_ID
order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) desc
```![image](https://github.com/user-attachments/assets/40c6edf7-e679-4e89-8e75-9c3063b31afd)
20. **What is the average revenue per purchase and the average number of days between purchases (purchase frequency) for the customer who made the most purchases?**
```sql
select D.Master_ID,D.TOTAL_REVENUE/D.FREQUENCY AVERAGE_REVENUE_BY_FREQ, D.TOTAL_REVENUE / round(DATEDIFF(day,First_order_date,Last_order_date) ,1) REVENUE_BY_DAY from (select top 1 Master_ID,First_order_date,Last_order_date,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE,
sum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY from CUSTOMERS group by Master_ID,First_order_date,Last_order_date
order by TOTAL_REVENUE desc) D
```![image](https://github.com/user-attachments/assets/103150e4-6681-4dd6-b39a-a274dde72530)
22. **What is the average number of days between purchases (purchase frequency) for the top 100 customers by total revenue?**
```sql
select D.Master_ID,D.First_order_date,D.Last_order_date,D.REVENUE/D.FREQUENCY AVG_REVENUE,DATEDIFF(day,D.First_order_date,D.Last_order_date) DATE_DIF,round(D.REVENUE/ DATEDIFF(day,D.First_order_date,D.Last_order_date),1) DATE_BY_REVENUE
from (select top 100 Master_ID,First_order_date,Last_order_date,
sum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)
REVENUE from CUSTOMERS group by Master_ID,First_order_date,Last_order_date order by REVENUE desc) D
```![image](https://github.com/user-attachments/assets/e23f31be-65d3-461c-86e2-514e0d300bcf)
24. **Who is the customer who made the most purchases, broken down by the last order channel (last_order_channel)?**
```sql
select distinct Last_order_channel,
(select top 1 Master_ID from CUSTOMERS where Last_order_channel=C.Last_order_channel
group by Master_ID
order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) desc) CUSTOMER, (select top 1 sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline )
from CUSTOMERS where Last_order_channel=C.Last_order_channel group by Master_ID order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline ) desc) REVENUE from CUSTOMERS C
```![image](https://github.com/user-attachments/assets/b63dd722-5baa-42f8-9457-5e45d1f82a6c)
25. **What is the ID of the last customer who made a purchase (including multiple IDs if there were multiple purchases on the latest date)?**
```sql
select Master_ID,Last_order_date from CUSTOMERS
group by Master_ID,Last_order_date
having Last_order_date = (select max(Last_order_date) from CUSTOMERS)
```![image](https://github.com/user-attachments/assets/296de4be-e635-406c-9372-592501dd3279)