https://github.com/susshiii/sql-layoffs-data-cleaning-and-eda
Full SQL project using MySQL to clean and analyze a real-world tech layoff dataset from 2020โ2023.
https://github.com/susshiii/sql-layoffs-data-cleaning-and-eda
data-analysis data-analytics-project data-cleaning eda layoffs mysql sql
Last synced: 10 months ago
JSON representation
Full SQL project using MySQL to clean and analyze a real-world tech layoff dataset from 2020โ2023.
- Host: GitHub
- URL: https://github.com/susshiii/sql-layoffs-data-cleaning-and-eda
- Owner: Susshiii
- Created: 2025-07-01T17:12:29.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-07-01T17:24:33.000Z (10 months ago)
- Last Synced: 2025-07-01T18:26:03.121Z (10 months ago)
- Topics: data-analysis, data-analytics-project, data-cleaning, eda, layoffs, mysql, sql
- Homepage:
- Size: 62.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ Layoffs Data Cleaning & Exploratory Data Analysis using SQL
This project demonstrates a full SQL-based data analytics workflow: starting from **cleaning raw layoff data** and ending with **exploratory data analysis (EDA)** to extract meaningful business insights. The dataset was taken from the [Alex The Analyst Data Analyst Bootcamp](https://www.youtube.com/@AlexTheAnalyst).
---
## ๐ Dataset Information
- **Name:** layoffs.csv
- **Source:** GitHub (via Alex The Analyst Bootcamp)
- **Content:** Layoffs from global tech companies during 2020โ2023
- **Columns include:**
- Company, Location, Industry
- Total Laid Off, % Laid Off
- Date of Layoff
- Company Stage (e.g., Series A, Series C)
- Country, Funding Raised
[Layoffs Dataset on GitHub](https://github.com/AlexTheAnalyst/MySQL-YouTube-Series/blob/main/layoffs.csv)
---
## ๐ Tools & Skills Used
| Tool | Purpose |
|------|---------|
| **MySQL** | SQL scripting, transformations, and analysis |
| **SQL Techniques** | `CTEs`, `ROW_NUMBER`, `GROUP BY`, `JOINS`, `CASE`, `TRIM`, `REPLACE`, `DATE FORMATTING`, `WINDOW FUNCTIONS`, `DENSE_RANK` |
---
## ๐ง Phase 1: Data Cleaning (`DATA_CLEANING_PROJECT.sql`)
### โ
Cleaning Objectives:
1. **Remove duplicates** using `ROW_NUMBER()` in a CTE
2. **Standardize inconsistent entries** like:
- Company names (trim extra spaces)
- Industry names (e.g., 'Crypto/Blockchain' โ 'Crypto')
- Country names (e.g., remove trailing '.' in 'United States.')
3. **Fix date formatting** using `STR_TO_DATE()`
4. **Handle missing values** by:
- Replacing empty strings with `NULL`
- Updating NULLs using inferred data from other rows
5. **Delete irrelevant records**
- Rows with both `total_laid_off` and `percentage_laid_off` as NULL
6. **Drop helper columns** like `row_num` after cleaning
### ๐งน Key Queries Used:
```sql
-- Assign row numbers to detect duplicates
ROW_NUMBER() OVER (
PARTITION BY company, location, industry, total_laid_off, percentage_laid_off, date, stage, country, funds_raised_millions
)
-- Trim company names
UPDATE layoffs_staging2
SET company = TRIM(company);
-- Format date column
UPDATE layoffs_staging2
SET date = STR_TO_DATE(date, '%m/%d/%Y');
-- Drop extra column
ALTER TABLE layoffs_staging2
DROP COLUMN row_num;