Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vishnu-vamshii/layoffs-data-analysis-in-sql
This project focuses on the cleaning and exploratory analysis of a dataset containing layoff information. It includes data deduplication, standardization of columns, handling null and blank values, and analyzing layoffs by company, industry, country, and date. Various SQL queries are used to explore trends and patterns in layoffs over time.
https://github.com/vishnu-vamshii/layoffs-data-analysis-in-sql
data-analysis eda mysql
Last synced: 11 days ago
JSON representation
This project focuses on the cleaning and exploratory analysis of a dataset containing layoff information. It includes data deduplication, standardization of columns, handling null and blank values, and analyzing layoffs by company, industry, country, and date. Various SQL queries are used to explore trends and patterns in layoffs over time.
- Host: GitHub
- URL: https://github.com/vishnu-vamshii/layoffs-data-analysis-in-sql
- Owner: vishnu-vamshii
- Created: 2024-09-09T02:02:23.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-21T02:10:42.000Z (5 months ago)
- Last Synced: 2024-11-30T22:20:02.529Z (2 months ago)
- Topics: data-analysis, eda, mysql
- Homepage:
- Size: 55.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Layoffs Data Cleaning and Exploratory Analysis
## Overview
This project involves cleaning and exploring a dataset of company layoffs. The data cleaning steps include removing duplicates, standardizing columns, handling blank or null values, and removing irrelevant columns. The analysis explores key trends such as layoffs by industry, country, and date. SQL is used to carry out these transformations and exploratory data analysis (EDA).## Table of Contents
- [Data Cleaning](#data-cleaning)
- [Removing Duplicates](#removing-duplicates)
- [Standardizing the Data](#standardizing-the-data)
- [Handling Null or Blank Values](#handling-null-or-blank-values)
- [Removing Irrelevant Columns](#removing-irrelevant-columns)
- [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)
- [Layoffs by Company](#layoffs-by-company)
- [Layoffs by Industry](#layoffs-by-industry)
- [Layoffs by Country](#layoffs-by-country)
- [Layoffs by Date](#layoffs-by-date)
- [Conclusion](#conclusion)## Data Cleaning
### Removing Duplicates
Duplicates were identified and removed based on key fields like company, location, industry, and total layoffs using SQL `ROW_NUMBER()`.### Standardizing the Data
Company names and industry values were standardized by trimming whitespace and formatting inconsistencies. The date column was converted into a standard format for further analysis.### Handling Null or Blank Values
Null and blank values were addressed for key columns such as `total_laid_off`, `percentage_laid_off`, and `industry`. In cases where relevant, missing values were filled by joining related rows.### Removing Irrelevant Columns
Non-essential columns, such as the row number, were dropped to streamline the dataset for analysis.## Exploratory Data Analysis (EDA)
### Layoffs by Company
The analysis includes identifying the companies with the highest total layoffs.### Layoffs by Industry
An analysis was performed to find which industries were most affected by layoffs.### Layoffs by Country
The number of layoffs per country was aggregated to understand the geographical distribution.### Layoffs by Date
Layoffs over time were explored, with the data grouped by year and month to identify trends.## Conclusion
This project successfully cleans and explores a dataset related to layoffs. Through SQL queries, we derive insights into which companies, industries, and countries are most affected by layoffs, as well as the timeframes in which layoffs peak.## Usage
To run the SQL commands, a compatible database such as MySQL or PostgreSQL is required.## License
This project is licensed under the MIT License.