An open API service indexing awesome lists of open source software.

https://github.com/sankaran-s2001/layoffs-sql-analysis

A complete SQL data cleaning and analysis project using MySQL to analyze global company layoffs from 2020-2023.
https://github.com/sankaran-s2001/layoffs-sql-analysis

data-science datacleaning eda kaggle layoffdata layoffs mysql sql

Last synced: about 1 month ago
JSON representation

A complete SQL data cleaning and analysis project using MySQL to analyze global company layoffs from 2020-2023.

Awesome Lists containing this project

README

          

# ๐Ÿ“Š SQL Layoffs Data Analysis

![MySQL](https://img.shields.io/badge/MySQL-4479A1?style=for-the-badge&logo=mysql&logoColor=white)
![SQL](https://img.shields.io/badge/SQL-336791?style=for-the-badge&logo=postgresql&logoColor=white)
![CSV](https://img.shields.io/badge/CSV-FFDD00?style=for-the-badge&logo=files&logoColor=black)
![DataCleaning](https://img.shields.io/badge/Data--Cleaning-4CAF50?style=for-the-badge&logo=simpleanalytics&logoColor=white)
![Workbench](https://img.shields.io/badge/MySQL%20Workbench-00758F?style=for-the-badge&logo=mysql&logoColor=white)
![Kaggle](https://img.shields.io/badge/Kaggle-20BEFF?style=for-the-badge&logo=kaggle&logoColor=white)

A complete SQL data cleaning and analysis project using MySQL to analyze global company layoffs from 2020-2023.

## ๐ŸŽฏ What This Project Does

This project takes messy, real-world layoffs data and cleans it up using SQL, then finds interesting patterns and insights about which companies, industries, and locations were most affected by layoffs.

## ๐Ÿ“‹ About the Data

**Source**: [Layoffs Dataset from Kaggle](https://www.kaggle.com/datasets/swaptr/layoffs-2022)

- **Size**: 2,300+ layoff records
- **Time**: 2020-2023
- **Coverage**: Companies worldwide
- **Industries**: Tech, Finance, Retail, Healthcare, and more

## ๐Ÿงน What I Did - Data Cleaning

### Step 1: Remove Duplicates

- Found and removed duplicate records
- Used SQL window functions to identify copies

### Step 2: Fix Data Problems

- Cleaned company names (removed extra spaces)
- Fixed industry names (made "Crypto" consistent)
- Fixed country names (removed dots from "United States.")
- Changed date format from text to proper dates

### Step 3: Handle Missing Data

- Filled in missing industry info when possible
- Removed records that had no useful layoff numbers

## ๐Ÿ“Š Key Findings

### ๐Ÿข Top 5 Companies with Most Layoffs
![output Screenshot](images/Top_5_companies_by_total_layoffs.jpg)

### ๐ŸŒ Top 5 Locations with Most Layoffs
![output Screenshot](images/Top_5_locations_by_total_layoffs.jpg)

### ๐Ÿญ Top 5 Industries with Most Layoffs
![output Screenshot](images/Top_5_industries_by_total_layoffs.jpg)

### ๐Ÿ“ˆ Biggest Single Layoff Event
![output Screenshot](images/Max_single_layoff.jpg)

### ๐Ÿ’” Companies That Shut Down Completely (100% Layoffs)
![output Screenshot](images/Companies_with_100_percentage_layoffs.jpg)

## ๐Ÿ› ๏ธ SQL Skills Used

- **Data Cleaning**: Removing duplicates, fixing messy data
- **Window Functions**: ROW_NUMBER(), RANK(), SUM() OVER()
- **Joins**: Connecting tables to fill missing data
- **Date Functions**: Converting text to dates
- **Aggregation**: GROUP BY, SUM(), COUNT(), MAX()
- **CTEs**: Common Table Expressions for complex queries

## ๐Ÿ“ Project Files

```
๐Ÿ“ฆ layoffs-sql-analysis
โ”œโ”€โ”€ ๐Ÿ“„ README.md (This file)
โ”œโ”€โ”€ ๐Ÿ“‚ data/
โ”‚ โ””โ”€โ”€ ๐Ÿ“„ layoffs.csv (Original dataset)
โ”œโ”€โ”€ ๐Ÿ“‚ sql/
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ data_cleaning.sql (Cleaning queries)
โ”‚ โ””โ”€โ”€ ๐Ÿ“„ eda.sql (Analysis queries)
โ””โ”€โ”€ ๐Ÿ“‚ images/
โ”œโ”€โ”€ ๐Ÿ“ท top_companies.png (Results screenshots)
โ”œโ”€โ”€ ๐Ÿ“ท top_locations.png
โ”œโ”€โ”€ ๐Ÿ“ท top_industries.png
โ”œโ”€โ”€ ๐Ÿ“ท max_layoffs.png
โ””โ”€โ”€ ๐Ÿ“ท complete_shutdowns.png
```

## ๐Ÿš€ How to Run This Project

### What You Need

- MySQL installed on your computer
- MySQL Workbench (makes it easier)

### Steps

1. **Download** the files from this repository
2. **Open** MySQL Workbench
3. **Create** a new database called `world_layoff`
4. **Import** the `layoffs.csv` file as a table called `layoffs`
5. **Run** the `data_cleaning.sql` file first
6. **Run** the `eda.sql` file second to see the analysis

## ๐Ÿ’ก What I Learned

- How to clean messy real-world data
- Advanced SQL techniques for data analysis
- Finding business insights from raw data
- Documenting and presenting data projects

## ๐ŸŽ“ Why This Project Matters

This project shows I can:

- โœ… Take messy data and make it clean and usable
- โœ… Write complex SQL queries to find insights
- โœ… Present findings in a clear, understandable way
- โœ… Work with real business data to solve problems

## โœ‰๏ธ Contact

**Sankaran S**
[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/sankaran-s2001) [![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/sankaran-s21/) [![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge&logo=gmail&logoColor=white)](mailto:sankaran121101@gmail.com)

***

*This project is part of my data science portfolio, showing my SQL skills and ability to work with real-world data.*