{"id":15066713,"url":"https://github.com/raad07/sql_project-world_layoffs_dataset","last_synced_at":"2026-01-27T08:33:09.980Z","repository":{"id":257738148,"uuid":"855349556","full_name":"RAAD07/SQL_Project-World_Layoffs_Dataset","owner":"RAAD07","description":"This is a SQL project which comprises the Data Cleaning in the first part and Exploratory Data Analysis (EDA) in the second part.","archived":false,"fork":false,"pushed_at":"2024-09-12T12:18:21.000Z","size":83,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-22T16:12:56.647Z","etag":null,"topics":["data-analysis","database","mysql","sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RAAD07.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-10T18:12:26.000Z","updated_at":"2024-09-18T08:45:05.000Z","dependencies_parsed_at":"2024-09-18T11:20:19.255Z","dependency_job_id":null,"html_url":"https://github.com/RAAD07/SQL_Project-World_Layoffs_Dataset","commit_stats":null,"previous_names":["raad07/sql_project-world_layoffs_dataset"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RAAD07%2FSQL_Project-World_Layoffs_Dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RAAD07%2FSQL_Project-World_Layoffs_Dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RAAD07%2FSQL_Project-World_Layoffs_Dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RAAD07%2FSQL_Project-World_Layoffs_Dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RAAD07","download_url":"https://codeload.github.com/RAAD07/SQL_Project-World_Layoffs_Dataset/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243822321,"owners_count":20353496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","database","mysql","sql"],"created_at":"2024-09-25T01:11:04.981Z","updated_at":"2026-01-27T08:33:09.914Z","avatar_url":"https://github.com/RAAD07.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# SQL_Project_World-Layoffs-Dataset\nThis SQL project comprises Data Cleaning in the first part and Exploratory Data Analysis (EDA) in the second part.\n\n## Part 1: Data Cleaning\nIn the Data Cleaning part, we will be focussing on 4 things.\n1. Remove Duplicates\n2. Standardize the data\n3. Handle Null values or blank values\n4. Remove Rows and Columns if necessary or if they are for no use\n\n### Checking the raw data\n``` sql\nSELECT * FROM layoffs; #layoffs hold all the raw data\n```\n### Creating a duplicate table to work and protect the raw data\n``` sql\nCREATE TABLE layoffs_worksheet \nLIKE layoffs;\n\nSELECT * FROM layoffs_worksheet;\n```\n\n### Inserting all the rows of the layoff table into the new layoffs_worksheet table\n``` sql\nINSERT layoffs_worksheet\nSELECT * FROM layoffs;\n```\n\n### Creating another table to manipulate data and create extra columns\n\n/*creating a new table named layoffs_worksheet2 by copying clipboard\u003ecreate statement\nand add one extra column named row_num*/\n\n``` sql\nCREATE TABLE `layoffs_worksheet2` (\n  `company` text,\n  `location` text,\n  `industry` text,\n  `total_laid_off` int DEFAULT NULL,\n  `percentage_laid_off` text,\n  `date` text,\n  `stage` text,\n  `country` text,\n  `funds_raised_millions` int DEFAULT NULL,\n  `row_num` int\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;\n\nSELECT * FROM layoffs_worksheet2;\n```\n### Inserting values to this new table and adding an extra column\n\n/* inserting values into this new table and assigning row_num by creating a partition on each column so that \neach unique row is assigned as 1 and if there are any duplicates then it will be assigned as 2 in the row_num */\n\n``` sql\nINSERT INTO layoffs_worksheet2\nSELECT *,\nROW_NUMBER() OVER(PARTITION BY company, location, industry, total_laid_off, percentage_laid_off, `date`, \nstage, country, funds_raised_millions) as row_num\nFROM layoffs_worksheet;\n\nSELECT * FROM layoffs_worksheet2;\n```\n\n## 1. Removing the duplicates\n\n### Checking the duplicates\n``` sql\nSELECT * FROM layoffs_worksheet2\nWHERE row_num \u003e 1;\n```\n\n### Deleting the duplicates\n``` sql\nDELETE\nFROM layoffs_worksheet2\nWHERE row_num \u003e 1;\n```\n\n### Rechecking whether there are still any duplicates or not\n``` sql\nSELECT * FROM layoffs_worksheet2\nWHERE row_num\u003e1;\n```\n\n## 2. Standardizing Columns\n\n### Checking and Trimming extra spaces from company names\n``` sql\nSELECT company, TRIM(company) as company_name\nFROM layoffs_worksheet2;\n```\n\n### Updating the company name in the database\n``` sql\nUPDATE layoffs_worksheet2\nSET company = TRIM(company);\n```\n\n### Updating the location in the database\n``` sql\nUPDATE layoffs_worksheet2\nSET location = TRIM(location);\n```\n\n### Updating the country name in the database\n``` sql\nUPDATE layoffs_worksheet2\nSET country = TRIM(country);\n```\n\n### Updating the industry name in the database\n``` sql\nUPDATE layoffs_worksheet2\nSET industry = TRIM(industry);\n```\n### Checking the new update\n``` sql\nSELECT DISTINCT (company)\nFROM layoffs_worksheet2;\n```\n### Checking the industry\n``` sql\nSELECT DISTINCT (industry)\nFROM layoffs_worksheet2\nORDER BY industry;\n```\n### Checking how many are related to crypto as there are 3 different types of industry found related to crypto\n``` sql\nSELECT * FROM layoffs_worksheet2\nWHERE industry LIKE '%crypto%';\n```\n### Updating all the industry related to crypto as crypto\n``` sql\nUPDATE layoffs_worksheet2\nSET industry = 'Crypto'\nWHERE industry LIKE '%crypto%';\n```\n\n### Checking the industries after the update\n``` sql\nSELECT DISTINCT (industry)\nFROM layoffs_worksheet2\nORDER BY industry;\n```\n\n### Checking the location whether the same type of location was input differently by mistake or not\n``` sql\nSELECT DISTINCT (location)\nFROM layoffs_worksheet2;\n```\n\n### Checking the country whether the same country was input differently by mistake or not\n``` sql\nSELECT DISTINCT (country)\nFROM layoffs_worksheet2\nORDER BY country;\n```\n### Checking whether all the countries named the United States are selected or not\n``` sql\nSELECT DISTINCT (country) FROM layoffs_worksheet2\nWHERE country LIKE '%United States%';\n```\n\n### Fixing United States\n``` sql\nUPDATE layoffs_worksheet2\nSET country= 'United States'\nWHERE country LIKE '%United States%';\n```\n\n### Rechecking the countries after updating\n``` sql\nSELECT DISTINCT (country)\nFROM layoffs_worksheet2\nORDER BY country;\n```\n\n### Checking the queries to format the date column\n``` sql\nSELECT `date`,\nSTR_TO_DATE(`date`, '%m/%d/%Y')\nFROM layoffs_worksheet2;\n```\n\n### Updating the date column\n``` sql\nUPDATE layoffs_worksheet2\nSET `date`= STR_TO_DATE(`date`, '%m/%d/%Y');\n\nSELECT *\nFROM layoffs_worksheet2\nLIMIT 100;\n```\n\n### Changing the date column data type from text to date\n``` sql\nALTER TABLE layoffs_worksheet2\nMODIFY `date` DATE;\n```\n\n## 3. Handling the null values and missing values\n\n### Checking the null or missing values in some columns\n```sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE industry IS NULL OR industry =''\nOR\ntotal_laid_off IS NULL OR total_laid_off =''\nOR\npercentage_laid_off IS NULL OR percentage_laid_off =''\nOR\nfunds_raised_millions IS NULL OR funds_raised_millions ='';\n```\n\n### Checking the industry column with missing values\n``` sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE industry IS NULL OR industry ='';\n```\n\n### Checking whether these missing companies have multiple rows from where we can pull the missing values\n``` sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE company IN\n(\nSELECT company\nFROM layoffs_worksheet2\nWHERE industry IS NULL OR industry =''\n);\n```\n\n### Checking missing values for the Industry column\n\n/* checking the companies have missing values for industry and whether they have values for the industry column\nfor some other rows or not so that we can populate the missing values with that industry type */\n``` sql\nSELECT * FROM layoffs_worksheet2 t1\nJOIN layoffs_worksheet2 t2\nON t1.company=t2.company\nWHERE (t1.industry is NULL or t1.industry='') AND (t2.industry IS NOT NULL AND t2.industry!='');\n```\n\n### Populating the missing values of the industry column\n\n/* populating the missing values of the industry column with the relevant industry values\npulled from other rows of the same company */\n```sql\nUPDATE layoffs_worksheet2 t1\nJOIN layoffs_worksheet2 t2\nON t1.company=t2.company\nSET t1.industry=t2.industry\nWHERE (t1.industry IS NULL OR t1.industry='') AND (t2.industry IS NOT NULL AND t2.industry!='');\n```\n### Checking the updated columns which had missing values previously\n\n/* checking whether the missing values are updated or not after executing the earlier queries\nand Airbnb and Carvana were some of those companies that had missing values in the industry column */\n``` sql\nSELECT * FROM layoffs_worksheet2\nWHERE company IN ('Airbnb','Carvana');\n```\n\n### Rechecking whether there are still any missing values or not in the industry column\n``` sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE company IN\n(\nSELECT company\nFROM layoffs_worksheet2\nWHERE industry IS NULL OR industry =''\n);\n```\n/* It seems all the rows are updated except 1 which did not have any other rows to pull the industry type from\nand we will leave it like this because we do not have anything to do for this row\nand putting wrong info is worse than having no info */\n\n\n ## 4. Remove Rows and Columns if necessary or if they are for no use\n\n/* As I will work with the total_laid_off, percentage_laid_off and funds_raised_millions data in the PART 2 \nof this project so if I have rows where both of these columns are missing then that row is for no use to us\n*/\n\n### Checking for missing values in all of those columns\n``` sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE (total_laid_off IS NULL or total_laid_off='')\nAND (percentage_laid_off IS NULL or percentage_laid_off='')\nAND (funds_raised_millions IS NULL or funds_raised_millions='');\n```\n\n### Deleting all the rows that have null or missing values in those 3 columns\n``` sql\nDELETE\nFROM layoffs_worksheet2\nWHERE (total_laid_off IS NULL or total_laid_off='')\nAND (percentage_laid_off IS NULL or percentage_laid_off='')\nAND (funds_raised_millions IS NULL or funds_raised_millions='');\n```\n/* I wanted to delete the rows where total_laid_off and percentage_laid_off data are missing\nbut 316 rows have missing values in those 2 columns and I am not sure whether\ndeleting so many rows will be a good choice or not so keeping them for now in the datasets.\nI will delete them later on if needed. \n*/\n\n``` sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE (total_laid_off IS NULL or total_laid_off='')\nAND (percentage_laid_off IS NULL or percentage_laid_off='');\n```\n\n### Checking the final datasets and whether I need to delete any other rows or unnecessary columns\n``` sql\nSELECT * FROM layoffs_worksheet2;\n```\n\n### Deleting the row_num column as the purpose of creating that column is served and it is for no use now\n``` sql\nALTER TABLE layoffs_worksheet2\nDROP COLUMN row_num;\n```\n\n### Checking and counting the rows of the final datasets and checking the difference between the raw dataset\n``` sql\nSELECT * FROM layoffs_worksheet2;\nSELECT COUNT(*) FROM layoffs_worksheet2;\nSELECT * FROM layoffs;\nSELECT COUNT(*) FROM layoffs;\n```\n\n## PART 2: Exploratory Data Analysis (EDA)\n\n### Checking the clean dataset\n``` sql\nSELECT * FROM layoffs_worksheet2;\n```\n\n### Checking the maximum total_laid_off and maximum percentage_laid_off\n``` sql\nSELECT MAX(total_laid_off) AS total_layoff, MAX(percentage_laid_off) as total_layoff_percentage\nFROM layoffs_worksheet2;\n```\n\n### Checking which companies have laid off all their employees and counting their numbers\n``` sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE percentage_laid_off=1\nORDER BY total_laid_off DESC;\n```\n``` sql\nSELECT COUNT(*) AS number_of_companies_with_full_layoff\nFROM layoffs_worksheet2\nWHERE percentage_laid_off=1;\n```\n\n### Funds raised by the companies that laid off all their employees\n``` sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE percentage_laid_off=1\nORDER BY funds_raised_millions DESC;\n```\n\n### Companies that laid off the most\n``` sql\nSELECT company, SUM(total_laid_off) AS total_number_of_laid_off_employees\nFROM layoffs_worksheet2\nGROUP BY company\nORDER BY total_number_of_laid_off_employees DESC;\n```\n\n### Which industries laid off the most\n``` sql\nSELECT industry, SUM(total_laid_off) AS total_number_of_laid_off_employees\nFROM layoffs_worksheet2\nGROUP BY industry\nORDER BY total_number_of_laid_off_employees DESC;\n```\n\n### Checking the time period or tenure of the layoff\n``` sql\nSELECT MIN(`date`) AS the_starting_date,\nMAX(`date`) AS the_end_date,\nDATEDIFF(MAX(`date`), MIN(`date`)) AS time_period_in_days\nFROM layoffs_worksheet2;\n```\n\n### Which countries laid of the most\n``` sql\nSELECT country, SUM(total_laid_off) AS total_number_of_laid_off_employees\nFROM layoffs_worksheet2\nGROUP BY country\nORDER BY total_number_of_laid_off_employees DESC;\n```\n\n### Checking whether Bangladesh is on the list or not\n```sql\nSELECT *\nFROM layoffs_worksheet2\nWHERE country LIKE '%Bangladesh%';\n```\n### Checking at which stage of the company most of the layoffs happened\n/* Series A means early stage than Series B and post IPO means post-initial public offering which includes Amazon and Google */\n``` sql\nSELECT stage, SUM(total_laid_off) as total_laid_off_in_a_year\nFROM layoffs_worksheet2\nGROUP BY stage\nORDER BY total_laid_off_in_a_year DESC;\n```\n/* It seems like post IPO, and mid-level (series c,d,e) had the most number of layoffs \nalong with companies who got acquired */\n\n### Checking in which year most of the layoffs happened\n``` sql\nSELECT YEAR(`date`) AS layoff_year, SUM(total_laid_off) as total_laid_off_in_a_year\nFROM layoffs_worksheet2\nGROUP BY layoff_year\nORDER BY total_laid_off_in_a_year DESC;\n```\n\n### Checking in which month most of the layoffs happened\n``` sql\nSELECT MONTH(`date`) AS layoff_month, SUM(total_laid_off) as total_laid_off_in_a_month\nFROM layoffs_worksheet2\nGROUP BY layoff_month\nORDER BY total_laid_off_in_a_month DESC;\n#seems like january had the most amount of layoff but we can only see the month and cannot see the year\n```\n\n### Checking the month and the year where most of the layoff happened\n``` sql\nSELECT SUBSTR(`date`, 1, 7) as layoff_year_and_month, SUM(total_laid_off) as total_laid_of_in_a_month_of_an_year\nFROM layoffs_worksheet2\nWHERE SUBSTR(`date`, 1, 7) IS NOT NULL OR SUBSTR(`date`, 1, 7)!=''\nGROUP BY layoff_year_and_month\nORDER BY layoff_year_and_month;\n```\n\n### Checking the month and year of layoff with the total till that month\n``` sql\nWITH roling_total AS\n(\nSELECT SUBSTR(`date`, 1, 7) as layoff_year_and_month, SUM(total_laid_off) as total_laid_of_in_a_month_of_an_year\nFROM layoffs_worksheet2\nWHERE SUBSTR(`date`, 1, 7) IS NOT NULL OR SUBSTR(`date`, 1, 7)!=''\nGROUP BY layoff_year_and_month\nORDER BY layoff_year_and_month\n)\nSELECT *,\nSUM(total_laid_of_in_a_month_of_an_year) OVER(ORDER BY layoff_year_and_month) as total_layoff_till_this_month\nFROM roling_total;\n```\n\n### Checking which companies laid off most of their employees in which months \n``` sql\nSELECT company, YEAR(`date`) AS layoff_year, SUM(total_laid_off) as total_laid_off_in_a_year\nFROM layoffs_worksheet2\nGROUP BY company, layoff_year\nORDER BY total_laid_off_in_a_year DESC, company, layoff_year DESC;\n```\n\n### Top 5 companies in each year that laid off their employees\n``` sql\nWITH top_companies AS\n(\nSELECT company, YEAR(`date`) as year_of_layoff, SUM(total_laid_off) AS total_laid_off_in_a_year\nFROM layoffs_worksheet2\nGROUP BY company, year_of_layoff\n),\ncompany_ranking AS\n(\nSELECT *,\nDENSE_RANK() OVER(PARTITION BY year_of_layoff ORDER BY total_laid_off_in_a_year DESC) AS ranking\nFROM top_companies\nWHERE year_of_layoff IS NOT NULL OR year_of_layoff!=''\n)\nSELECT * FROM company_ranking\nWHERE ranking\u003c=5;\n```\n\n### Top 5 industries in each year that laid off their employees\n``` sql\nWITH top_industries AS\n(\nSELECT industry, YEAR(`date`) as year_of_layoff, SUM(total_laid_off) AS total_laid_off_in_a_year\nFROM layoffs_worksheet2\nGROUP BY industry, year_of_layoff\n),\nindustry_ranking AS\n(\nSELECT *,\nDENSE_RANK() OVER(PARTITION BY year_of_layoff ORDER BY total_laid_off_in_a_year DESC) AS ranking\nFROM top_industries\nWHERE year_of_layoff IS NOT NULL OR year_of_layoff!=''\n)\nSELECT * FROM industry_ranking\nWHERE ranking\u003c=5;\n```\n\n### Top 5 countries in each year that laid off their employees\n``` sql\nWITH top_countries AS\n(\nSELECT country, YEAR(`date`) as year_of_layoff, SUM(total_laid_off) AS total_laid_off_in_a_year\nFROM layoffs_worksheet2\nGROUP BY country, year_of_layoff\n),\ncountry_ranking AS\n(\nSELECT *,\nDENSE_RANK() OVER(PARTITION BY year_of_layoff ORDER BY total_laid_off_in_a_year DESC) AS ranking\nFROM top_countries\nWHERE year_of_layoff IS NOT NULL OR year_of_layoff!=''\n)\nSELECT * FROM country_ranking\nWHERE ranking\u003c=5;\n```\n\n### Top companies that raised funds\n``` sql\nSELECT company, SUM(funds_raised_millions) as total_funds_in_millions\nFROM layoffs_worksheet2\nGROUP BY company\nORDER BY total_funds_in_millions DESC;\n```\n\n### Top 5 companies in each year that had the most amount of funds\n``` sql\nWITH top_companies AS\n(\nSELECT company, YEAR(`date`) as year_of_funding, SUM(funds_raised_millions) AS total_funds_collected_in_a_year_in_millions\nFROM layoffs_worksheet2\nGROUP BY company, year_of_funding\n),\ncompany_ranking AS\n(\nSELECT *,\nDENSE_RANK() OVER(PARTITION BY year_of_funding ORDER BY total_funds_collected_in_a_year_in_millions DESC) AS ranking\nFROM top_companies\nWHERE year_of_funding IS NOT NULL OR year_of_funding!=''\n)\nSELECT * FROM company_ranking\nWHERE ranking\u003c=5;\n```\n\n### Top 5 industries in each year that had the most amount of funds\n``` sql\nWITH top_industries AS\n(\nSELECT industry, YEAR(`date`) as year_of_funding, SUM(funds_raised_millions) AS total_funds_collected_in_a_year_in_millions\nFROM layoffs_worksheet2\nGROUP BY industry, year_of_funding\n),\nindustry_ranking AS\n(\nSELECT *,\nDENSE_RANK() OVER(PARTITION BY year_of_funding ORDER BY total_funds_collected_in_a_year_in_millions DESC) AS ranking\nFROM top_industries\nWHERE year_of_funding IS NOT NULL OR year_of_funding!=''\n)\nSELECT * FROM industry_ranking\nWHERE ranking\u003c=5;\n```\n\n### Top 5 countries (company belongs to that country) in each year that had the most amount of funds\n``` sql\nWITH top_countries AS\n(\nSELECT country, YEAR(`date`) as year_of_funding, SUM(funds_raised_millions) AS total_funds_collected_in_a_year_in_millions\nFROM layoffs_worksheet2\nGROUP BY country, year_of_funding\n),\ncountry_ranking AS\n(\nSELECT *,\nDENSE_RANK() OVER(PARTITION BY year_of_funding ORDER BY total_funds_collected_in_a_year_in_millions DESC) AS ranking\nFROM top_countries\nWHERE year_of_funding IS NOT NULL OR year_of_funding!=''\n)\nSELECT * FROM country_ranking\nWHERE ranking\u003c=5;\n```\nSurprisingly, Lithuania was in 4th rank in 2022 with the most funds. Therefore, checking the companies from Lithuania\n``` sql\nSELECT * FROM layoffs_worksheet2\nWHERE country LIKE '%Lithuania%';\n\n#UBER is from Lithuania and that makes sense now.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraad07%2Fsql_project-world_layoffs_dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraad07%2Fsql_project-world_layoffs_dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraad07%2Fsql_project-world_layoffs_dataset/lists"}