https://github.com/cagandemirmr/flo_data_exploration
You can find data exploration of FLO
https://github.com/cagandemirmr/flo_data_exploration
dataanalysis dataexploration queries sql sql-server
Last synced: about 2 months ago
JSON representation
You can find data exploration of FLO
- Host: GitHub
- URL: https://github.com/cagandemirmr/flo_data_exploration
- Owner: cagandemirmr
- Created: 2024-10-15T18:08:52.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-10-15T19:32:41.000Z (7 months ago)
- Last Synced: 2024-10-17T07:32:59.449Z (7 months ago)
- Topics: dataanalysis, dataexploration, queries, sql, sql-server
- Homepage:
- Size: 25.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# FLO Customer Data Exploration

This project involves the exploration and analysis of customer data from **FLO**, a leading producer and retailer in the Turkish shoe industry. The dataset contains information on customer purchasing behavior across various channels, including both online and offline transactions.
## Dataset Information
- **Total Rows**: 20,000
- **Total Columns**: 13### Column Descriptions
1. **Master_ID**:
- A unique 36-character identifier composed of numbers and letters that anonymously represents each customer.2. **Order_Channel**:
- The channel used by the customer when placing an order. It consists of the following 4 variables:
- iOS App
- Android App
- Desktop
- Mobile3. **Last_order_channel**:
- The channel used by the customer for their most recent order. It includes 5 variables:
- iOS App
- Android App
- Desktop
- Mobile
- Offline (In-store purchases)4. **First_order_date**:
- The date when the customer placed their first order. The minimum date is **2013-01-14** and the maximum date is **2021-05-27**.5. **Last_order_date**:
- The date when the customer placed their most recent order. The minimum date is **2020-05-30** and the maximum date is **2021-05-30**.6. **Last_order_date_online**:
- The date when the customer made their most recent online order.7. **Last_order_date_offline**:
- The date when the customer made their most recent offline (in-store) order.8. **Order_num_total_ever_offline**:
- The total number of orders placed by the customer through offline channels.9. **Order_num_total_ever_online**:
- The total number of orders placed by the customer through online channels.10. **Customer_value_total_ever_offline**:
- The total revenue generated by the customer from offline purchases.11. **Customer_value_total_ever_online**:
- The total revenue generated by the customer from online purchases.12. **Interested_in_categories_12**:
- This column contains 32 unique values representing categories assigned based on the customer's interests.13. **Store_type**:
- Consists of 3 main store types, with 4 combinations based on customer purchasing habits.## Steps
### Data Preparation
Before importing the data into SQL, I made several adjustments in Excel:
- Formatted date columns to ensure consistency.
- Removed unnecessary characters and cleaned up the data.After these modifications, I copied the Excel tables into SQL. This approach is efficient for:
- Familiarizing myself with the data.
- Data normalization, which reduces the size of the data.
However, while this method improves efficiency for checking and exploring data, it's less efficient in terms of large-scale data usage. Fortunately, the dataset's size can be classified as moderate.
### SQL Queries
To gain a basic understanding of the data, I wrote the following queries in SQL:
```sql
-- General overview of the dataset
SELECT * FROM CUSTOMERS;-- To observe the minimum and maximum character lengths of Master_ID
SELECT DISTINCT MIN(LEN(Master_ID)) AS MINIMUM_LENGTH,
MAX(LEN(Master_ID)) AS MAXIMUM_LENGTH
FROM CUSTOMERS
GROUP BY Master_ID;-- Distinct values of the Order Channel
SELECT DISTINCT Order_channel FROM CUSTOMERS;-- Distinct values of the Last Order Channel
SELECT DISTINCT Last_order_channel FROM CUSTOMERS;-- Minimum and maximum Last Order Dates
SELECT MIN(Last_order_date) AS MIN_LAST_DATE,
MAX(Last_order_date) AS MAX_LAST_DATE
FROM CUSTOMERS;-- Minimum and maximum First Order Dates
SELECT MIN(First_order_date) AS MIN_FIRST_DATE,
MAX(First_order_date) AS MAX_FIRST_DATE
FROM CUSTOMERS;-- Minimum and maximum Last Online Order Dates
SELECT MIN(Last_order_date_online) AS MIN_LAST_OR_ONLINE_DATE,
MAX(Last_order_date_online) AS MAX_LAST_OR_ONLINE_DATE
FROM CUSTOMERS;-- Minimum and maximum Last Offline Order Dates
SELECT MIN(Last_order_date_offline) AS MIN_LAST_OR_OFFLINE_DATE,
MAX(Last_order_date_offline) AS MAX_LAST_OR_OFFLINE_DATE
FROM CUSTOMERS;-- Minimum and maximum order amounts (offline)
SELECT MIN(Order_num_total_ever_offline),
MAX(Order_num_total_ever_offline)
FROM CUSTOMERS;-- Minimum and maximum order amounts (online)
SELECT MIN(Order_num_total_ever_online),
MAX(Order_num_total_ever_online)
FROM CUSTOMERS;-- Minimum and maximum customer values (offline)
SELECT MIN(Customer_value_total_ever_offline),
MAX(Customer_value_total_ever_offline)
FROM CUSTOMERS;-- Minimum and maximum customer values (online)
SELECT MIN(Customer_value_total_ever_online),
MAX(Customer_value_total_ever_online)
FROM CUSTOMERS;-- Distinct values of Interested Categories
SELECT DISTINCT Interested_in_categories_12 FROM CUSTOMERS;-- Distinct values of Store Type
SELECT DISTINCT Store_type FROM CUSTOMERS;```
These queries provided a foundational understanding of the data and helped identify key trends and outliers in customer behavior, order channels, and purchasing patterns### Analysis Questions
In the end of variability analysis,some questions appeared in my mind and write querries in SQL.
1. **What is the unique customer number who made a purchase?**
```sql
select count(distinct Master_ID) TOTALCUSTOMERS from CUSTOMERS```

2. **What is the total number of purchases and the total revenue?**
```sql
select sum(Order_num_total_ever_offline+Order_num_total_ever_online) TOTAL_AMOUNT_OF_SALES,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) REVENUE
from CUSTOMERS
```
3. **What is the average revenue per purchase?**
```sql
select (sum(Customer_value_total_ever_offline+ Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online) )AVG_REVENUE
from CUSTOMERS
```
5. **What is the total revenue and number of purchases made through the last order channel (last_order_channel)?**
```sql
select Last_order_channel, (sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/ sum(Order_num_total_ever_offline+Order_num_total_ever_online)) AVG_REVENUE
from CUSTOMERS group by Last_order_channel
```
7. **What is the total revenue by store type?**
```sql
select Store_type,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE
from CUSTOMERS group by Store_type
```
9. **What is the number of purchases per year based on the customer's first order date (first_order_date)?**
```sql
select year(First_order_date) YEAR_,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_AMOUNT_OF_SALES
from CUSTOMERS group by year(First_order_date) order by 2 desc
```
11. **What is the average revenue per purchase, by the last order channel (last_order_channel)?**
```sql
select Last_order_channel, sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)/sum(Order_num_total_ever_offline+Order_num_total_ever_online) PRODUCTIVITY
from CUSTOMERS
group by Last_order_channel
```
12. **What is the most popular category in the last 12 months?**
```sql
select Interested_in_categories_12,count(*) Frequency
from CUSTOMERS
group by Interested_in_categories_12
order by 2 desc
```
14. **What is the most preferred store type?**
```sql
select top 1 Store_type,count(*) Store_count
from CUSTOMERS group by Store_type
order by 2 desc
```
16. **What is the most popular category and the total amount spent in that category, broken down by the last order channel (last_order_channel)?**
```sql
SELECT DISTINCT
C.Last_order_channel,
(SELECT TOP 1 Interested_in_categories_12
FROM CUSTOMERS
WHERE Last_order_channel = C.Last_order_channel
GROUP BY Interested_in_categories_12
ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Top_category,
(SELECT TOP 1 SUM(Order_num_total_ever_offline + Order_num_total_ever_online)
FROM CUSTOMERS
WHERE Last_order_channel = C.Last_order_channel
GROUP BY Interested_in_categories_12
ORDER BY SUM(Order_num_total_ever_offline + Order_num_total_ever_online) DESC) AS Total_order
FROM CUSTOMERS C
```
18. **What is the ID of the customer who made the most purchases?**
```sql
select top 1 Master_ID,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_SALES
from CUSTOMERS group by Master_ID
order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) desc
```
20. **What is the average revenue per purchase and the average number of days between purchases (purchase frequency) for the customer who made the most purchases?**
```sql
select D.Master_ID,D.TOTAL_REVENUE/D.FREQUENCY AVERAGE_REVENUE_BY_FREQ, D.TOTAL_REVENUE / round(DATEDIFF(day,First_order_date,Last_order_date) ,1) REVENUE_BY_DAY from (select top 1 Master_ID,First_order_date,Last_order_date,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) TOTAL_REVENUE,
sum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY from CUSTOMERS group by Master_ID,First_order_date,Last_order_date
order by TOTAL_REVENUE desc) D
```
22. **What is the average number of days between purchases (purchase frequency) for the top 100 customers by total revenue?**
```sql
select D.Master_ID,D.First_order_date,D.Last_order_date,D.REVENUE/D.FREQUENCY AVG_REVENUE,DATEDIFF(day,D.First_order_date,D.Last_order_date) DATE_DIF,round(D.REVENUE/ DATEDIFF(day,D.First_order_date,D.Last_order_date),1) DATE_BY_REVENUE
from (select top 100 Master_ID,First_order_date,Last_order_date,
sum(Order_num_total_ever_offline+Order_num_total_ever_online) FREQUENCY,sum(Customer_value_total_ever_offline+Customer_value_total_ever_online)
REVENUE from CUSTOMERS group by Master_ID,First_order_date,Last_order_date order by REVENUE desc) D
```
24. **Who is the customer who made the most purchases, broken down by the last order channel (last_order_channel)?**
```sql
select distinct Last_order_channel,
(select top 1 Master_ID from CUSTOMERS where Last_order_channel=C.Last_order_channel
group by Master_ID
order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_online) desc) CUSTOMER, (select top 1 sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline )
from CUSTOMERS where Last_order_channel=C.Last_order_channel group by Master_ID order by sum(Customer_value_total_ever_offline+Customer_value_total_ever_offline ) desc) REVENUE from CUSTOMERS C
```
25. **What is the ID of the last customer who made a purchase (including multiple IDs if there were multiple purchases on the latest date)?**
```sql
select Master_ID,Last_order_date from CUSTOMERS
group by Master_ID,Last_order_date
having Last_order_date = (select max(Last_order_date) from CUSTOMERS)
```