An open API service indexing awesome lists of open source software.

https://github.com/apostolis-bloutsos-data/employee-data-eda

Mini EDA project on synthetic employee records using Python, pandas, and matplotlib
https://github.com/apostolis-bloutsos-data/employee-data-eda

data-analysis eda jupyter-notebook matplotlib pandas python seaborn

Last synced: about 2 months ago
JSON representation

Mini EDA project on synthetic employee records using Python, pandas, and matplotlib

Awesome Lists containing this project

README

          

# Employee Data EDA

This repository contains a **mini exploratory data analysis (EDA)** project on a small synthetic dataset of employee records.
The goal is to demonstrate **data cleaning, grouping, summarization, and visualization** using Python’s **pandas** and **matplotlib** libraries.

---

## Project Overview
The dataset includes:
- Employee ID and Name
- Department
- Age
- Salary
- Start Date
- Gender

The analysis covers:
1. Inspecting and understanding the dataset
2. Handling missing values (comparison of dropping vs imputation)
3. Grouping and aggregating data to extract insights
4. Creating simple visualizations for clarity
5. Summarizing findings in business-friendly terms

---

## Key Insights
- Finance employees have the highest average salary and age.
- IT is the largest department but has the lowest average salary.
- HR and IT departments are currently single-gender; Finance is gender-balanced.
- Missing values were **imputed** instead of dropped to preserve all records.
- IT salaries have the widest range, HR salaries are the most compact.

---

## Repository Structure
employee-data-eda/

│── employee_data_insights_eda.ipynb # Jupyter notebook with the full analysis

└── README.md # Project description and findings

## View the Notebook
You can view the full analysis here:
[Employee Data Insights EDA Notebook](https://github.com/apostolis-bloutsos-data/employee-data-eda/blob/main/employee_data_insights_eda.ipynb)