https://github.com/meet-afk/mysql-layoffs-data-cleaning-exploratory-analysis
This is an end-to-end data project performed entirely in MySQL. It covers the full workflow from cleaning a raw, messy layoffs dataset to performing a comprehensive exploratory data analysis (EDA) to uncover significant trends and insights.
https://github.com/meet-afk/mysql-layoffs-data-cleaning-exploratory-analysis
eda exploratory-data-analysis mysql mysql-database patterns trends
Last synced: 19 days ago
JSON representation
This is an end-to-end data project performed entirely in MySQL. It covers the full workflow from cleaning a raw, messy layoffs dataset to performing a comprehensive exploratory data analysis (EDA) to uncover significant trends and insights.
- Host: GitHub
- URL: https://github.com/meet-afk/mysql-layoffs-data-cleaning-exploratory-analysis
- Owner: meet-afk
- License: mit
- Created: 2025-09-06T07:06:58.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-09-06T07:40:08.000Z (about 1 month ago)
- Last Synced: 2025-09-06T09:22:21.753Z (about 1 month ago)
- Topics: eda, exploratory-data-analysis, mysql, mysql-database, patterns, trends
- Homepage:
- Size: 149 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐๏ธ MySQL Layoffs Data Cleaning & Exploratory Analysis
[](https://www.mysql.com/)
[](https://github.com/)
[](https://github.com/)
[](https://github.com/)This is an end-to-end data project performed entirely in **MySQL**. It covers the full workflow from cleaning a raw, messy layoffs dataset to performing a comprehensive exploratory data analysis (EDA) to uncover significant trends and insights.
## ๐ Table of Contents
* [About The Project](#-about-the-project)
* [Tech Stack](#-tech-stack)
* [Project Workflow](#-project-workflow)
* [Part 1: Data Cleaning](#-part-1-data-cleaning)
* [Part 2: Exploratory Data Analysis (EDA)](#-part-2-exploratory-data-analysis-eda)
* [Key Findings & Insights](#-key-findings--insights)
* [How to Reproduce](#-how-to-reproduce)
* [Conclusion & Learnings](#-conclusion--learnings)
* [Connect with Me](#-connect-with-me)---
This project handles a real-world layoffs dataset (`layoffs.csv`) and takes it through two critical phases using SQL:
1. **Data Cleaning:** The initial raw data is meticulously cleaned and preprocessed to create a reliable and accurate dataset (`layoffs_cleaned.csv`).
2. **Exploratory Data Analysis:** The cleaned dataset is then analyzed to identify patterns, trends, and key insights related to layoffs across various dimensions like industry, geography, and time.---
- **MySQL Workbench:** The integrated development environment for executing SQL queries.
- **SQL:** The core language used for all data manipulation, cleaning, and analysis tasks.---
The project was executed in two distinct phases:
### Phase 1: Data Cleaning
- **Objective:** To transform the raw, inconsistent `layoffs.csv` file into a structured, clean dataset ready for analysis.
- **Process:** A series of SQL queries were run to handle duplicates, nulls, and inconsistencies.
- **Outcome:** A new, clean table/CSV named `layoffs_cleaned`.### Phase 2: Exploratory Data Analysis (EDA)
- **Objective:** To query the `layoffs_cleaned` data to uncover trends and answer key business questions.
- **Process:** SQL queries were used to aggregate, group, and analyze the data across different dimensions.
- **Outcome:** A set of actionable insights about the global layoff landscape between 2020 and 2023.---
### Steps Performed:
- โ **Remove Duplicates:** Identified and deleted duplicate rows based on key columns to ensure data integrity.
- โ **Standardize Data:** Trimmed whitespace and standardized categorical data (e.g., industry names) for consistency.
- โ **Handle Null/Blank Values:** Addressed `NULL` or empty values in critical columns, populating them where possible or flagging them for analysis.
- โ **Remove Unnecessary Columns/Rows:** Dropped columns and rows that were not relevant to the analysis to streamline the dataset.---
## ๐ Part 2: Exploratory Data Analysis (EDA)
### Key Questions Addressed:
- Which industries and companies were most affected by layoffs?
- What were the time-based patterns (yearly, monthly) of layoffs?
- What was the geographical impact of these layoffs?
- How did the funding stage of a company correlate with the number of layoffs?---
## ๐ก Key Findings & Insights
- **Time Frame:** The analysis covers layoffs from **March 2020 to March 2023**.
- **Industry Impact:** **Consumer** and **Retail** industries were hit the hardest, with over **44,782** and **43,613** layoffs respectively.
- **Geographical Hotspots:** The **United States** saw the highest number of layoffs (**256,559**), followed by India and the Netherlands.
- **Yearly Trends:** **2022** was the year with the most recorded layoffs (**160,661**), though early 2023 showed an alarming spike.
- **Monthly Patterns:** A massive surge in layoffs occurred in early 2023, with **January 2023** alone accounting for over **84,000** job cuts.
- **Funding Stage Insights:** **Post-IPO** companies had the highest number of layoffs (**204,132**), indicating major workforce reductions after going public.
- **Top Companies:** Tech giants **Google (12,000)**, **Meta**, and **Amazon** led the charts for the largest single layoff events.---
The SQL scripts for both phases of the project are available. To reproduce the results:
1. Load the raw `layoffs.csv` into a MySQL database.
2. Run the queries from the **Data Cleaning** script to create the clean dataset.
3. Run the queries from the **Data Analysis** script on the cleaned dataset to derive the insights.- **[SQL Queries for Data Cleaning](https://github.com/meet-afk/MySQL-Layoffs-Data-Cleaning-Exploratory-Analysis/blob/main/Data_cleaning-Queries.pdf)**
- **[SQL Queries for Data Analysis](https://github.com/meet-afk/MySQL-Layoffs-Data-Cleaning-Exploratory-Analysis/blob/main/EDA-Queries.pdf)**---
## ๐ Conclusion & Learnings
This project was an excellent practical exercise in using **SQL for real-world data analysis**. By systematically cleaning and then exploring the dataset, I was able to extract meaningful insights about layoff trends. The process reinforced my SQL skills and highlighted the importance of a clean data foundation for any analysis. Future work could involve visualizing these SQL-driven insights in a BI tool to create a more compelling narrative.
---
Have feedback or suggestions? I'd love to hear from you!
- **My Portfolio:** [My Portfolio](https://meet-afk.github.io/)
Check out my other projects! โ **[My GitHub Profile](https://github.com/meet-afk)**