https://github.com/edochiari/amazon_sales_analysis-project
Advanced SQL project analyzing over 20,000 sales records from an Amazon-like e-commerce platform. Focuses on customer behavior, product performance, and sales trends using PostgreSQL. Includes data cleaning, revenue analysis, inventory management, and an ERD diagram to visualize database schema.
https://github.com/edochiari/amazon_sales_analysis-project
complexjoins datacleaning datatransformation nestedqueries pgadmin postgresql subqueries windowfunctions
Last synced: 3 months ago
JSON representation
Advanced SQL project analyzing over 20,000 sales records from an Amazon-like e-commerce platform. Focuses on customer behavior, product performance, and sales trends using PostgreSQL. Includes data cleaning, revenue analysis, inventory management, and an ERD diagram to visualize database schema.
- Host: GitHub
- URL: https://github.com/edochiari/amazon_sales_analysis-project
- Owner: EdoChiari
- Created: 2025-02-25T09:31:10.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-02-25T11:24:54.000Z (3 months ago)
- Last Synced: 2025-02-25T12:22:49.708Z (3 months ago)
- Topics: complexjoins, datacleaning, datatransformation, nestedqueries, pgadmin, postgresql, subqueries, windowfunctions
- Homepage:
- Size: 1.09 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
---
# **Amazon USA Sales Analysis Project**
---
## **Project Overview**
I have worked on analyzing a dataset of over 20,000 sales records from an Amazon-like e-commerce platform. This project involves extensive querying of customer behavior, product performance, and sales trends using PostgreSQL. Through this project, I have tackled various SQL problems, including revenue analysis, customer segmentation, and inventory management.
The project also focuses on data cleaning, handling null values, and solving real-world business problems using structured queries.
An ERD diagram is included to visually represent the database schema and relationships between tables.
---

## **Database Setup & Design**
### **Schema Structure**
```sql
CREATE TABLE category
(
category_id INT PRIMARY KEY,
category_name VARCHAR(20)
);-- customers TABLE
CREATE TABLE customers
(
customer_id INT PRIMARY KEY,
first_name VARCHAR(20),
last_name VARCHAR(20),
state VARCHAR(20),
address VARCHAR(5) DEFAULT ('xxxx')
);-- sellers TABLE
CREATE TABLE sellers
(
seller_id INT PRIMARY KEY,
seller_name VARCHAR(25),
origin VARCHAR(15)
);-- products table
CREATE TABLE products
(
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
price FLOAT,
cogs FLOAT,
category_id INT, -- FK
CONSTRAINT product_fk_category FOREIGN KEY(category_id) REFERENCES category(category_id)
);-- orders
CREATE TABLE orders
(
order_id INT PRIMARY KEY,
order_date DATE,
customer_id INT, -- FK
seller_id INT, -- FK
order_status VARCHAR(15),
CONSTRAINT orders_fk_customers FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
CONSTRAINT orders_fk_sellers FOREIGN KEY (seller_id) REFERENCES sellers(seller_id)
);CREATE TABLE order_items
(
order_item_id INT PRIMARY KEY,
order_id INT, -- FK
product_id INT, -- FK
quantity INT,
price_per_unit FLOAT,
CONSTRAINT order_items_fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id),
CONSTRAINT order_items_fk_products FOREIGN KEY (product_id) REFERENCES products(product_id)
);-- payment TABLE
CREATE TABLE payments
(
payment_id
INT PRIMARY KEY,
order_id INT, -- FK
payment_date DATE,
payment_status VARCHAR(20),
CONSTRAINT payments_fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id)
);CREATE TABLE shippings
(
shipping_id INT PRIMARY KEY,
order_id INT, -- FK
shipping_date DATE,
return_date DATE,
shipping_providers VARCHAR(15),
delivery_status VARCHAR(15),
CONSTRAINT shippings_fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id)
);CREATE TABLE inventory
(
inventory_id INT PRIMARY KEY,
product_id INT, -- FK
stock INT,
warehouse_id INT,
last_stock_date DATE,
CONSTRAINT inventory_fk_products FOREIGN KEY (product_id) REFERENCES products(product_id)
);
```---
## **Task: Data Cleaning**
I cleaned the dataset by:
- **Removing duplicates**: Duplicates in the customer and order tables were identified and removed.
- **Handling missing values**: Null values in critical fields (e.g., customer address, payment status) were either filled with default values or handled using appropriate methods.---
## **Handling Null Values**
Null values were handled based on their context:
- **Customer addresses**: Missing addresses were assigned default placeholder values.
- **Payment statuses**: Orders with null payment statuses were categorized as “Pending.”
- **Shipping information**: Null return dates were left as is, as not all shipments are returned.---
## **Objective**
The primary objective of this project is to showcase SQL proficiency through complex queries that address real-world e-commerce business challenges. The analysis covers various aspects of e-commerce operations, including:
- Customer behavior
- Sales trends
- Inventory management
- Payment and shipping analysis
- Forecasting and product performance
## **Identifying Business Problems**
Key business problems identified:
1. Low product availability due to inconsistent restocking.
2. High return rates for specific product categories.
3. Significant delays in shipments and inconsistencies in delivery times.
4. High customer acquisition costs with a low customer retention rate.---
## **Solving Business Problems**
### Solutions Implemented:
1. Top Selling Products
Query the top 10 products by total sales value.
Challenge: Include product name, total quantity sold, and total sales value.```sql
SELECT
oi.product_id,
p.product_name,
ROUND(SUM(oi.total_sale)::numeric, 2) AS total_sale,
COUNT(o.order_id) AS total_orders
FROM orders AS o
JOIN order_items AS oi ON oi.order_id = o.order_id
JOIN products AS p ON p.product_id = oi.product_id
GROUP BY oi.product_id, p.product_name
ORDER BY total_sale DESC
LIMIT 10;
```2. Revenue by Category
Calculate total revenue generated by each product category.
Challenge: Include the percentage contribution of each category to total revenue.```sql
SELECT
c.category_id,
c.category_name,
ROUND(SUM(oi.total_sale)::numeric, 2) AS total_sale,
ROUND(
(SUM(oi.total_sale)::numeric /
(SELECT SUM(total_sale)::numeric FROM order_items)
) * 100,
2
) AS contribution
FROM order_items AS oi
JOIN products AS p ON p.product_id = oi.product_id
LEFT JOIN category AS c ON c.category_id = p.category_id
GROUP BY c.category_id, c.category_name
ORDER BY total_sale DESC;
```3. Average Order Value (AOV)
Compute the average order value for each customer.
Challenge: Include only customers with more than 5 orders.```sql
SELECT
c.customer_id,
CONCAT(c.first_name, ' ', c.last_name) AS full_name,
ROUND(SUM(oi.total_sale)::numeric / COUNT(o.order_id),2) AS AOV
FROM orders AS o
JOIN customers AS c ON c.customer_id = o.customer_id
JOIN order_items AS oi ON oi.order_id = o.order_id
GROUP BY c.customer_id, full_name
HAVING COUNT(o.order_id) > 5;```
4. Monthly Sales Trend
Query monthly total sales over the past year.
Challenge: Display the sales trend, grouping by month, return current_month sale, last month sale!```sql
SELECT
year,
month,
total_sale as current_month_sale,
LAG(total_sale, 1) OVER(ORDER BY year, month) as last_month_sale
FROM ---
(
SELECT
EXTRACT(MONTH FROM o.order_date) as month,
EXTRACT(YEAR FROM o.order_date) as year,
ROUND(
SUM(oi.total_sale::numeric)
,2) as total_sale
FROM orders as o
JOIN
order_items as oi
ON oi.order_id = o.order_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY 1, 2
ORDER BY year, month
) as t1
```5. Customers with No Purchases
Find customers who have registered but never placed an order.
Challenge: List customer details and the time since their registration.```sql
SELECT
c.customer_id,
CONCAT(c.first_name, ' ', c.last_name) AS full_name,
c.state
FROM customers AS c
LEFT JOIN
orders AS o
ON o.customer_id = c.customer_id
WHERE o.customer_id IS NULL;
```6. Least-Selling Categories by State
Identify the least-selling product category for each state.
Challenge: Include the total sales for that category within each state.```sql
WITH ranking_table AS (
SELECT
c.state,
cat.category_name,
ROUND(SUM(oi.total_sale)::numeric, 2) AS total_sales,
RANK() OVER (PARTITION BY c.state ORDER BY SUM(oi.total_sale) ASC) AS rank
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.customer_id
JOIN order_items AS oi ON o.order_id = oi.order_id
JOIN products AS p ON oi.product_id = p.product_id
JOIN category AS cat ON cat.category_id = p.category_id
GROUP BY c.state, cat.category_name
)SELECT
state,
category_name,
total_sales
FROM ranking_table
WHERE rank = 1;
```7. Customer Lifetime Value (CLTV)
Calculate the total value of orders placed by each customer over their lifetime.
Challenge: Rank customers based on their CLTV.```sql
SELECT
c.customer_id,
CONCAT(c.first_name, ' ', c.last_name) AS full_name,
ROUND(SUM(oi.total_sale)::numeric, 2) AS CLTV,
DENSE_RANK() OVER (ORDER BY SUM(oi.total_sale) DESC) AS cust_ranking
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.customer_id
JOIN order_items AS oi ON oi.order_id = o.order_id
GROUP BY c.customer_id, full_name
ORDER BY cust_ranking;
```8. Inventory Stock Alerts
Query products with stock levels below a certain threshold (e.g., less than 10 units).
Challenge: Include last restock date and warehouse information.```sql
SELECT
i.inventory_id,
p.product_name,
i.stock AS current_stock_left,
i.last_stock_date,
i.warehouse_id
FROM inventory AS i
JOIN products AS p ON p.product_id = i.product_id
WHERE stock < 10
```9. Shipping Delays
Identify orders where the shipping date is later than 3 days after the order date.
Challenge: Include customer, order details, and delivery provider.```sql
SELECT
c.*,
o.*,
s.shipping_providers
FROM orders AS o
JOIN customers AS c ON c.customer_id = o.customer_id
JOIN shippings AS s ON o.order_id = s.order_id
WHERE s.shipping_date > o.order_date + INTERVAL '3 day';
```10. Payment Success Rate
Calculate the percentage of successful payments across all orders.
Challenge: Include breakdowns by payment status (e.g., failed, pending).```sql
SELECT
p.payment_status,
COUNT(*) AS total_cnt,
ROUND(COUNT(*)::numeric / (SELECT COUNT(*) FROM payments)::numeric * 100, 2) AS percentage
FROM orders AS o
JOIN payments AS p ON o.order_id = p.order_id
GROUP BY p.payment_status;
```11. Top Performing Sellers
Find the top 5 sellers based on total sales value.
Challenge: Include both successful and failed orders, and display their percentage of successful orders.```sql
WITH top_sellers AS (
SELECT
s.seller_id,
s.seller_name,
ROUND(SUM(oi.total_sale)::numeric, 2) AS total_sale
FROM orders AS o
JOIN sellers AS s ON o.seller_id = s.seller_id
JOIN order_items AS oi ON o.order_id = oi.order_id
GROUP BY s.seller_id, s.seller_name
ORDER BY total_sale DESC
LIMIT 5
),sellers_reports AS (
SELECT
o.seller_id,
tp.seller_name,
o.order_status,
COUNT(*) AS total_orders
FROM orders AS o
JOIN top_sellers AS tp ON o.seller_id = tp.seller_id
WHERE o.order_status NOT IN ('Inprogress', 'Returned')
GROUP BY o.seller_id, tp.seller_name, o.order_status
)SELECT
seller_id,
seller_name,
SUM(CASE WHEN order_status = 'Completed' THEN total_orders ELSE 0 END) AS completed_orders,
SUM(CASE WHEN order_status = 'Cancelled' THEN total_orders ELSE 0 END) AS cancelled_orders,
SUM(total_orders) AS total_orders,
ROUND(
COALESCE(SUM(CASE WHEN order_status = 'Completed' THEN total_orders ELSE 0 END)::numeric
/ NULLIF(SUM(total_orders), 0)::numeric * 100, 0),
2
) AS perc_successful_orders
FROM sellers_reports
GROUP BY seller_id, seller_name
ORDER BY perc_successful_orders DESC;
```12. Product Profit Margin
Calculate the profit margin for each product (difference between price and cost of goods sold).
Challenge: Rank products by their profit margin, showing highest to lowest.
*/```sql
SELECT
p.product_id,
p.product_name,
ROUND(
(
SUM(oi.total_sale - (p.cogs * oi.quantity))::numeric
/ NULLIF(SUM(oi.total_sale), 0)::numeric
) * 100,
2
) AS profit_margin
FROM order_items AS oi
JOIN products AS p ON oi.product_id = p.product_id
GROUP BY p.product_id, p.product_name
ORDER BY profit_margin DESC;```
13. Most Returned Products
Query the top 10 products by the number of returns.
Challenge: Display the return rate as a percentage of total units sold for each product.```sql
SELECT
p.product_id,
p.product_name,
COUNT(*) AS total_unit_sold,
SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END) AS total_returned,
ROUND(
SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END)::numeric
/ NULLIF(COUNT(*), 0)::numeric,
2
) * 100 AS perc_tot_ret
FROM order_items AS oi
JOIN products AS p ON oi.product_id = p.product_id
JOIN orders AS o ON oi.order_id = o.order_id
GROUP BY p.product_id, p.product_name
ORDER BY perc_tot_ret DESC
LIMIT 10;
```14. Inactive Sellers
Identify sellers who haven’t made any sales in the last 6 months.
Challenge: Show the last sale date and total sales from those sellers.```sql
WITH cte1 AS (
SELECT
s.seller_id,
s.seller_name
FROM sellers AS s
WHERE NOT EXISTS (
SELECT 1
FROM orders AS o
WHERE o.seller_id = s.seller_id
AND o.order_date >= CURRENT_DATE - INTERVAL '6 month'
)
)SELECT
cte1.seller_id,
cte1.seller_name,
MAX(o.order_date) AS last_sale_date,
ROUND(SUM(oi.total_sale)::numeric, 2) AS total_sales
FROM cte1
JOIN orders AS o
ON cte1.seller_id = o.seller_id
AND o.order_date < CURRENT_DATE - INTERVAL '6 month'
JOIN order_items AS oi
ON o.order_id = oi.order_id
GROUP BY cte1.seller_id, cte1.seller_name
HAVING SUM(oi.total_sale) > 0
ORDER BY last_sale_date DESC;```
15. IDENTITY customers into returning or new
if the customer has done more than 5 return categorize them as returning otherwise new
Challenge: List customers id, name, total orders, total returns```sql
SELECT
cte.customer_id,
cte.full_name AS customers,
cte.total_orders,
cte.total_return,
CASE WHEN cte.total_return > 5 THEN 'Returning customer' ELSE 'NEW' END AS customer_category
FROM (
SELECT
c.customer_id,
CONCAT(c.first_name, ' ', c.last_name) AS full_name,
COUNT(o.order_id) AS total_orders,
SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END) AS total_return
FROM customers AS c
JOIN orders AS o ON c.customer_id = o.customer_id
JOIN shippings AS s ON o.order_id = s.order_id -- Fixed JOIN condition
GROUP BY c.customer_id, full_name
) AS cte;
```16. Top 5 Customers by Orders in Each State
Identify the top 5 customers with the highest number of orders for each state.
Challenge: Include the number of orders and total sales for each customer.
```sql
SELECT *
FROM (
SELECT
c.state,
CONCAT(c.first_name, ' ', c.last_name) AS full_name,
COUNT(o.order_id) AS total_orders,
ROUND(SUM(oi.total_sale)::numeric, 2) AS total_sale,
DENSE_RANK() OVER(PARTITION BY c.state ORDER BY COUNT(o.order_id) DESC) AS rank
FROM orders AS o
JOIN order_items AS oi ON o.order_id = oi.order_id
JOIN customers AS c ON o.customer_id = c.customer_id
GROUP BY c.state, full_name
) AS ranked_customers
WHERE rank <= 5;
```17. Revenue by Shipping Provider
Calculate the total revenue handled by each shipping provider.
Challenge: Include the total number of orders handled and the average delivery time for each provider.```sql
SELECT
s.shipping_providers,
COUNT(o.order_id) AS order_handled,
ROUND(SUM(oi.total_sale)::numeric, 2) AS total_sale,
COALESCE(ROUND(AVG(s.return_date - s.shipping_date)::numeric, 2), 0) AS average_days
FROM orders AS o
JOIN order_items AS oi ON oi.order_id = o.order_id
JOIN shippings AS s ON s.order_id = o.order_id
GROUP BY s.shipping_providers;
```18. Top 10 product with highest decreasing revenue ratio compare to year 2022 and year 2023
Challenge: Return product_id, product_name, category_name, 2022 revenue and 2023 revenue decrease ratio at end Round the result
Note: Decrease ratio = y23-y22/y22* 100```sql
WITH year_2022 AS (
SELECT
p.product_id,
p.product_name,
c.category_name,
ROUND(SUM(oi.total_sale)::numeric, 2) AS revenue
FROM orders AS o
JOIN order_items AS oi ON oi.order_id = o.order_id
JOIN products AS p ON p.product_id = oi.product_id
JOIN category AS c ON c.category_id = p.category_id
WHERE EXTRACT(YEAR FROM o.order_date) = 2022
GROUP BY p.product_id, p.product_name, c.category_name
),year_2023 AS (
SELECT
p.product_id,
p.product_name,
c.category_name,
ROUND(SUM(oi.total_sale)::numeric, 2) AS revenue
FROM orders AS o
JOIN order_items AS oi ON oi.order_id = o.order_id
JOIN products AS p ON p.product_id = oi.product_id
JOIN category AS c ON c.category_id = p.category_id
WHERE EXTRACT(YEAR FROM o.order_date) = 2023
GROUP BY p.product_id, p.product_name, c.category_name
)SELECT
y22.product_id,
y22.product_name,
y22.category_name,
y22.revenue AS revenues_22,
y23.revenue AS revenues_23,
ROUND(((y23.revenue - y22.revenue) / y22.revenue) * 100, 2) AS revenue_dec_ratio
FROM year_2022 AS y22
JOIN year_2023 AS y23 ON y22.product_id = y23.product_id
WHERE y22.revenue > y23.revenue
ORDER BY revenue_dec_ratio ASC
LIMIT 10;
```## **Learning Outcomes**
This project enabled me to:
- Design and implement a normalized database schema.
- Clean and preprocess real-world datasets for analysis.
- Use advanced SQL techniques, including window functions, subqueries, and joins.
- Conduct in-depth business analysis using SQL.
- Optimize query performance and handle large datasets efficiently.---
## **Conclusion**
This advanced SQL project successfully demonstrates my ability to solve real-world e-commerce problems using structured queries. From improving customer retention to optimizing inventory and logistics, the project provides valuable insights into operational challenges and solutions.
By completing this project, I have gained a deeper understanding of how SQL can be used to tackle complex data problems and drive business decision-making.
---
### **Entity Relationship Diagram (ERD)**
---