{"id":20527083,"url":"https://github.com/sushant-suresh/amazon_e-commerce_data_analysis_sql_project","last_synced_at":"2026-04-20T13:35:30.501Z","repository":{"id":256072631,"uuid":"851786611","full_name":"Sushant-Suresh/Amazon_E-commerce_Data_Analysis_SQL_Project","owner":"Sushant-Suresh","description":"SQL project analyzing e-commerce data of Amazon to address key business questions and uncover insights.","archived":false,"fork":false,"pushed_at":"2024-10-12T18:20:16.000Z","size":338,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-06T01:48:46.966Z","etag":null,"topics":["database","postgresql","sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sushant-Suresh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-03T17:45:05.000Z","updated_at":"2024-10-12T18:20:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"2ee20fbe-c98a-4ae9-926c-e2cc5ca8e34a","html_url":"https://github.com/Sushant-Suresh/Amazon_E-commerce_Data_Analysis_SQL_Project","commit_stats":null,"previous_names":["sushant-suresh/e-commerce_data-analysis_sql_project","sushant-suresh/amazon_e-commerce_data_analysis_sql_project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Sushant-Suresh/Amazon_E-commerce_Data_Analysis_SQL_Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sushant-Suresh%2FAmazon_E-commerce_Data_Analysis_SQL_Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sushant-Suresh%2FAmazon_E-commerce_Data_Analysis_SQL_Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sushant-Suresh%2FAmazon_E-commerce_Data_Analysis_SQL_Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sushant-Suresh%2FAmazon_E-commerce_Data_Analysis_SQL_Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sushant-Suresh","download_url":"https://codeload.github.com/Sushant-Suresh/Amazon_E-commerce_Data_Analysis_SQL_Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sushant-Suresh%2FAmazon_E-commerce_Data_Analysis_SQL_Project/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268231815,"owners_count":24217084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","postgresql","sql"],"created_at":"2024-11-15T23:17:13.878Z","updated_at":"2026-04-20T13:35:30.432Z","avatar_url":"https://github.com/Sushant-Suresh.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Amazon E-commerce Data Analysis SQL Project\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/ea55bfbd-027e-45e7-9cc3-0a3da7099a1a\" alt=\"amazon-banner_123\"\u003e\n\u003c/p\u003e\n\n## Project Overview\n\n**Project Title**: Amazon E-commerce Data Analysis\n\n**Database**: `amazon_project`\n\nThis project is designed to demonstrate my SQL skills and techniques which I used  to explore and analyze the e-commerce data for Amazon. The project involves setting up a database and answering specific business questions through SQL queries.\n\n## Objectives\n\n1. **Set up an e-commerce sales database**: Create and populate an e-commerce database with the provided .csv files.\n2. **Business Analysis**: Use SQL to answer specific business questions and derive insights from the sales data.\n\n## Project Structure\n\n### 1. Database and Schema Setup\n\n- **Database Creation**: The project starts by creating a database named `amazon_project`.\n- **Table Creation**: The following tables are created - `Customers`, `Sellers`, `Products`, `Orders`, `Returns`.\n```sql\n-- Creating Database\nCREATE DATABASE amazon_project;\n\n-- Creating `Customers` Table\nDROP TABLE IF EXISTS customers;\nCREATE TABLE customers (customer_id VARCHAR(25) PRIMARY KEY, customer_name VARCHAR(25), state VARCHAR(25));\n\n-- Creating `Sellers` Table\nDROP TABLE IF EXISTS sellers;\nCREATE TABLE sellers (seller_id VARCHAR(25) PRIMARY KEY, seller_name VARCHAR(25));\n\n-- Creating `Products` Table\nDROP TABLE IF EXISTS products;\nCREATE TABLE products (product_id VARCHAR(25) PRIMARY KEY, product_name VARCHAR(255), price FLOAT, cogs FLOAT);\n\n-- Creating `Orders` Table\nDROP TABLE IF EXISTS orders;\nCREATE TABLE orders (order_id VARCHAR(25) PRIMARY KEY, order_date DATE, customer_id VARCHAR(25),\n                     state VARCHAR(25), category VARCHAR(25), sub_category VARCHAR(25), product_id VARCHAR(25),\n                     price_per_unit FLOAT, quantity INT, sale FLOAT, seller_id VARCHAR(25),\n\n                     CONSTRAINT fk_customers FOREIGN KEY (customer_id) REFERENCES customers(customer_id),\n                     CONSTRAINT fk_products  FOREIGN KEY (product_id)  REFERENCES products(product_id),    \n                     CONSTRAINT fk_sellers   FOREIGN KEY (seller_id)   REFERENCES sellers(seller_id));\n\n-- Creating `Returns` Table\nDROP TABLE IF EXISTS returns;\nCREATE TABLE returns (return_id VARCHAR(25) PRIMARY KEY, order_id VARCHAR(25),\n\nCONSTRAINT fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id));\n```\n**ERD For Database:**\n\n![Amazon_ERD](https://github.com/user-attachments/assets/1556c995-1fdc-4510-bd94-0fc3ea1654c4)\n### 2. Data Imported Into Tables From the .csv Files\n```sql\n-- `orders` Table Structure \u0026 Data\nSELECT * FROM orders;\n```\n**Output:**\n\n![orders](https://github.com/user-attachments/assets/a123c407-c364-4f16-90b9-dde0057456ce)\n```sql\n-- `returns` Table Structure \u0026 Data\nSELECT * FROM returns;\n```\n**Output:**\n\n![returns](https://github.com/user-attachments/assets/b36c118b-d28e-4a48-9dec-905abb3dd803)\n```sql\n-- `customers` Table Structure \u0026 Data\nSELECT * FROM customers;\n```\n**Output:**\n\n![customers](https://github.com/user-attachments/assets/0800f67b-e9d1-4041-8475-f947c364f501)\n```sql\n-- `sellers` Table Structure \u0026 Data\nSELECT * FROM sellers;\n```\n**Output:**\n\n![sellers](https://github.com/user-attachments/assets/881cc4c1-dde2-4e34-bbf7-07f68672df64)\n```sql\n-- `products` Table Structure \u0026 Data\nSELECT * FROM products;\n```\n**Output:**\n\n![products](https://github.com/user-attachments/assets/5e80645f-3033-4b51-a97a-c3bd3ce86811)\n### 3. Data Analysis \u0026 Findings\n\nThe following SQL queries were used to answer specific business questions:\n\n1. **What Is the Total Sale and Average Sale for Goa, Uttarakhand \u0026 Bihar (Round-Off to 2 Decimal Places)?**\n```sql\nSELECT state,\n       ROUND(SUM(sale)::numeric, 2) AS total_sale,\n       ROUND(AVG(sale)::numeric, 2) AS avg_sale\nFROM orders\nWHERE state IN ('Goa', 'Uttarakhand', 'Bihar')\nGROUP BY state;\n```\n**Output:**\n\n![Q1](https://github.com/user-attachments/assets/dba86fb5-3741-4c1c-921c-be8a85102c45)\n\n2. **What Is the Total Revenue Generated by Each State?**\n```sql\nSELECT state,\n       ROUND(SUM(sale)::numeric, 1) AS revenue\nFROM orders\nWHERE state IS NOT NULL\nGROUP BY state;\n```\n**Output:**\n\n![Q2](https://github.com/user-attachments/assets/b9f2036b-0abc-46f6-bb79-5e936696e93b)\n\n3. **How Many Orders Were Placed by Each Customer, and What Is Their Average Order Quantity?**\n```sql\nSELECT customer_id,\n       COUNT(order_id) AS orders_placed,\n\t   ROUND(AVG(quantity)::numeric, 1) AS avg_quantity_ordered\nFROM orders\nGROUP BY customer_id\n```\n**Output:**\n\n![Q3](https://github.com/user-attachments/assets/01d82228-01e8-4ff7-91b1-46103005ce2a)\n\n4. **Which Category Has the Second Highest Average Sale Amount Per Order?**\n```sql\nSELECT category,\n       ROUND(AVG(sale)::numeric, 2) AS avg_sale\nFROM orders\nWHERE category IS NOT NULL\nGROUP BY category\nORDER BY avg_sale DESC\nOFFSET 1\nLIMIT 1;\n```\n**Output:**\n\n![Q4](https://github.com/user-attachments/assets/0193368e-0a39-4d33-820c-75da91a15ffc)\n\n5. **Identify the Top 3 Best-Selling Products (Sub-Categories) in Terms of Total Quantity Sold.**\n```sql\nSELECT sub_category,\n\t   SUM(quantity) AS total_quantity\nFROM orders\nGROUP BY sub_category\nORDER BY total_quantity DESC\nLIMIT 3;\n```\n**Output:**\n\n![Q5](https://github.com/user-attachments/assets/9d811fb4-59fd-4cee-98f4-7996c93028de)\n\n6. **Find Top 3 Products Which Generated Revenue \u003e 10000.**\n```sql\nSELECT product_id,\n       SUM(sale) AS revenue\nFROM orders\nGROUP BY product_id\nHAVING SUM(sale) \u003e 10000\nORDER BY revenue DESC\nLIMIT 3;\n```\n**Output:**\n\n![Q6](https://github.com/user-attachments/assets/b2566042-18d8-4120-a49a-77ceb100706e)\n\n7. **Find the Best Selling Month in 2022 Based on Revenue.**\n```sql\nSELECT TO_CHAR(order_date, 'Month') AS month,\n       SUM(sale) AS revenue\nFROM orders\nWHERE EXTRACT(YEAR FROM order_date) = 2022\nGROUP BY TO_CHAR(order_date, 'Month')\nORDER BY revenue DESC\nLIMIT 1;\n```\n**Output:**\n\n![Q7](https://github.com/user-attachments/assets/f4cda261-c4bc-475e-bc17-f04ecaebab1c)\n\n8. **Find Customer Names Having Total Orders \u003e 100 and Sort Them by Their Total Revenue in Descending Order.**\n```sql\nSELECT o.customer_id,\n       c.customer_name,\n       COUNT(o.order_id) AS total_orders,\n\t   ROUND(SUM(o.sale)::numeric, 2) AS revenue\nFROM orders AS o\nINNER JOIN customers AS c\nON o.customer_id = c.customer_id\nGROUP BY 1,2\nHAVING COUNT(o.order_id) \u003e 100\nORDER BY revenue DESC;\n```\n**Output:**\n\n![Q8](https://github.com/user-attachments/assets/2b34d901-34d5-40ad-8527-2f03ef9b46ee)\n\n9. **Identify All Orders That Have Been Returned, Along With the Details of the Returns (If Available).**\n```sql\nSELECT *\nFROM orders AS o\nRIGHT JOIN returns AS r\nON o.order_id = r.order_id;\n```\n**Output:**\n\n![Q9](https://github.com/user-attachments/assets/2c3a7ea1-7866-4625-ba7e-920ee2d989a0)\n\n10. **Find All the Instances Where Products Have Been Sold, Returned, or Both, Along With the Associated Details.**\n```sql\nSELECT p.product_id,\n       p.product_name,\n       COUNT(DISTINCT o.order_id) AS total_orders,\n       COUNT(DISTINCT r.return_id) AS total_returns\nFROM products AS p\nLEFT JOIN orders AS o \nON p.product_id = o.product_id\nLEFT JOIN returns AS r \nON o.order_id = r.order_id\nGROUP BY p.product_id, p.product_name;\n```\n**Output:**\n\n![Q10](https://github.com/user-attachments/assets/819b3b4f-3d95-4a4c-8c3e-c8590652dbc0)\n\n11. **Find Each Customer’s Latest and Second Latest Order Amount.**\n```sql\nWITH RankedOrders AS (SELECT o.customer_id,\n                             o.order_id,\n                             ROUND(SUM(o.quantity * o.price_per_unit)::numeric, 2) AS order_amount,\n                             ROW_NUMBER() OVER (PARTITION BY o.customer_id ORDER BY o.order_date DESC) AS order_rank\n                      FROM orders AS o\n                      GROUP BY o.customer_id, o.order_id, o.order_date)\nSELECT customer_id,\n       MAX(CASE WHEN order_rank = 1 THEN order_amount END) AS latest_order_amount,\n       MAX(CASE WHEN order_rank = 2 THEN order_amount END) AS second_latest_order_amount\nFROM RankedOrders\nGROUP BY customer_id;\n```\n**Output:**\n\n![Q11](https://github.com/user-attachments/assets/3d7b5b51-9f1b-4ca9-8082-9f71c1141a3d)\n\n12. **Identify Top-Selling Products by Revenue for Each Category.**\n```sql\nSELECT category, product_name, total_sales\nFROM (SELECT o.category,\n             p.product_name,\n             ROUND(SUM(o.sale)::numeric, 2) AS total_sales,\n             ROW_NUMBER() OVER (PARTITION BY o.category ORDER BY SUM(o.sale) DESC) AS rn\n      FROM orders AS o\n      JOIN products AS p ON o.product_id = p.product_id\n      GROUP BY o.category, p.product_name) AS ranked_sales\nWHERE rn = 1 and category IS NOT NULL;\n```\n**Output:**\n\n![Q12](https://github.com/user-attachments/assets/cc462ad2-5031-49ce-88d3-b420e24fdf5d)\n\n13. **Identify Customers With Orders Exceeding the Average Order Value.**\n```sql\nSELECT DISTINCT o.customer_id, c.customer_name\nFROM orders AS o \nJOIN customers AS c ON o.customer_id = c.customer_id\nWHERE o.sale \u003e (SELECT AVG(sale) FROM orders);\n```\n**Output:**\n\n![Q13](https://github.com/user-attachments/assets/c99df34c-1483-49e1-90c8-ba089792f505)\n\n14. **Calculate the Return Rate Per Seller.**\n```sql\nSELECT s.seller_id,\n       s.seller_name,\n       COALESCE(ROUND((COUNT(r.return_id) * 100.0 / COUNT(o.order_id)), 2), 0) AS return_rate\nFROM sellers AS s\nLEFT JOIN orders AS o ON s.seller_id = o.seller_id\nLEFT JOIN returns AS r ON o.order_id = r.order_id\nGROUP BY s.seller_id, s.seller_name;\n```\n**Output:**\n\n![Q14](https://github.com/user-attachments/assets/bdcd9241-af6b-49ea-8dde-c21a4616a691)\n\n15. **Identify States With Highest Sales in Each Product Category.**\n```sql\nSELECT category, \n       state, \n       total_sales\nFROM (SELECT o.category,\n             o.state,\n             ROUND(SUM(o.sale)::numeric, 2) AS total_sales,\n             ROW_NUMBER() OVER (PARTITION BY o.category ORDER BY SUM(o.sale) DESC) AS rn\n      FROM orders AS o\n      GROUP BY o.category, o.state) AS ranked_sales\nWHERE rn = 1 AND category IS NOT NULL;\n```\n**Output:**\n\n![Q15](https://github.com/user-attachments/assets/bf48235c-73ca-4874-9e20-a81651c3eda2)\n\n16. **Identify Products With Profit Margin Below Average.**\n```sql\nSELECT product_id, \n       product_name, \n\t   price, \n\t   cogs, \n\t   ROUND((price - cogs)::numeric, 2) AS profit_margin\nFROM products\nWHERE (price - cogs) \u003c (SELECT AVG(price - cogs) FROM products);\n```\n**Output:**\n\n![Q16](https://github.com/user-attachments/assets/bb91ab57-e0fe-41d4-92f7-4665f6418284)\n\n17. **Identify Top 3 Products Frequently Returned Across All Categories. If Multiple Products Have the Same Number of Returns, They Should Be Ranked by the Total Sale Value.**\n```sql\nSELECT product_name, \n       category, \n\t   return_count, \n\t   total_sales\nFROM (SELECT p.product_name,\n             o.category,\n             COUNT(r.return_id) AS return_count,\n             ROUND(SUM(o.sale)::numeric, 2) AS total_sales,\n             RANK() OVER (ORDER BY COUNT(r.return_id) DESC, SUM(o.sale) DESC) AS rn\n      FROM orders AS o\n      JOIN products AS p ON o.product_id = p.product_id\n      LEFT JOIN returns AS r ON o.order_id = r.order_id\n      GROUP BY p.product_name, o.category) AS ranked_returns\nWHERE rn \u003c= 3;\n```\n**Output:**\n\n![Q17](https://github.com/user-attachments/assets/f293f2d8-4036-4e3b-97ff-bc8515225da7)\n\n## Conclusion\n\nThis project covers the following tasks: database setup, data importing and analysis using business-driven SQL queries. The findings from this project can help drive business decisions by understanding sales patterns, customer behavior, and product demand.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsushant-suresh%2Famazon_e-commerce_data_analysis_sql_project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsushant-suresh%2Famazon_e-commerce_data_analysis_sql_project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsushant-suresh%2Famazon_e-commerce_data_analysis_sql_project/lists"}