{"id":25812504,"url":"https://github.com/athul64/exploratory-data-analysis","last_synced_at":"2026-02-25T22:04:35.877Z","repository":{"id":270361270,"uuid":"910119536","full_name":"Athul64/Exploratory-Data-Analysis","owner":"Athul64","description":"To preprocess and analyze the given employee dataset, present the findings graphically, and derive meaningful insights to help better understand the company’s workforce.","archived":false,"fork":false,"pushed_at":"2024-12-31T10:20:17.000Z","size":385,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-08T14:03:36.281Z","etag":null,"topics":["colab-notebook","data-analysis","data-visualization","matplotlib","numpy","pandas","python","seaborn","statistical-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Athul64.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-30T14:41:23.000Z","updated_at":"2024-12-31T10:22:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"db569e82-8bb6-4df7-8de0-3028a0402cca","html_url":"https://github.com/Athul64/Exploratory-Data-Analysis","commit_stats":null,"previous_names":["athul64/exploratory-data-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Athul64/Exploratory-Data-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Athul64%2FExploratory-Data-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Athul64%2FExploratory-Data-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Athul64%2FExploratory-Data-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Athul64%2FExploratory-Data-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Athul64","download_url":"https://codeload.github.com/Athul64/Exploratory-Data-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Athul64%2FExploratory-Data-Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29842895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T21:18:31.832Z","status":"ssl_error","status_checked_at":"2026-02-25T21:18:29.265Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colab-notebook","data-analysis","data-visualization","matplotlib","numpy","pandas","python","seaborn","statistical-analysis"],"created_at":"2025-02-28T01:54:22.194Z","updated_at":"2026-02-25T22:04:35.873Z","avatar_url":"https://github.com/Athul64.png","language":"Jupyter Notebook","readme":"# Employee Data Analysis Project\n\nWelcome to the Employee Data Analysis Project! This project is a comprehensive exploration of a dataset from ABC Company, aimed at deriving valuable insights into the employee data through preprocessing, analysis, and visualization. Below, you'll find an overview of the project components, methodologies, and findings.\n\n## Project Objective\nTo preprocess and analyze the given employee dataset, present the findings graphically, and derive meaningful insights to help better understand the company’s workforce.\n\n## Dataset\nThe dataset contains **458 rows** and **9 columns** and includes information about employees across various teams and positions. The columns include:\n- `Team`\n- `Position`\n- `Age`\n- `Salary`\n- `Height`\n- `Name`\n- `Number`\n- `Weight`\n- `College`\n\n## Preprocessing Steps\n1. **Handling Missing Data**:\n   - Missing values in the `Salary` column were replaced with the **median salary**.\n   - Missing values in the `College` column were replaced with the **most frequent value (mode)**.\n2. **Data Correction**:\n   - Randomly replaced inconsistent values in the `Height` column with values between **150 cm and 180 cm**, using `np.random.seed(42)` for reproducibility.\n3. **Data Cleaning**:\n   - Verified the dataset for duplicates and null values after preprocessing.\n4. **Export**:\n   - Saved the cleaned dataset as `cleaned_data.csv` for further analysis.\n\n## Analysis Tasks\n1. **Distribution of Employees Across Teams**:\n   - Calculated the percentage split of employees across teams.\n   - Visualized the distribution using a **pie chart**.\n\n2. **Employee Segregation by Position**:\n   - Grouped employees based on their positions.\n   - Visualized the counts using a **horizontal bar chart**.\n\n3. **Predominant Age Group**:\n   - Identified the most frequent age group among employees.\n   - Presented the data using a **histogram**.\n\n4. **Salary Expenditure Analysis**:\n   - Determined which team and position had the highest total salary expenditure.\n   - Visualized the data using a **stacked bar chart**.\n\n5. **Correlation Between Age and Salary**:\n   - Computed the correlation coefficient to identify relationships.\n   - Represented the data using a **scatter plot**.\n\n## Visualizations\nThe project includes the following visualizations:\n1. **Pie Chart**: Employee distribution across teams.\n2. **Bar Chart**: Number of employees in each position.\n3. **Histogram**: Predominant age group.\n4. **Stacked Bar Chart**: Salary expenditure by team and position.\n5. **Scatter Plot**: Age vs. Salary correlation.\n\n## Key Findings\n- The team with the highest salary expenditure is **[Team Name]**, and the position contributing most to this expenditure is **[Position Name]**.\n- The most predominant age group among employees is **[Age Group]**.\n- There is a **[weak/moderate/strong] correlation** between age and salary, indicating **[specific insight, e.g., older employees tend to earn more/less].**\n\n## How to Run the Project\n1. Clone this repository:\n   ```bash\n   git clone \u003cgithub.com/Athul64/Exploratory-Data-Analysis\u003e\n   ```\n2. Install the required Python libraries:\n   ```bash\n   pip install numpy pandas matplotlib seaborn\n   ```\n3. Run the Jupyter Notebook:\n   ```bash\n   jupyter notebook Exploratory Data Analysis.ipynb\n   ```\n\n## Files in the Repository\n- `Exploratory Data Analysis.ipynb`: The main Jupyter Notebook containing code and analysis.\n- `data.csv`: The original dataset.\n- `cleaned_data.csv`: The preprocessed dataset.\n- `README.md`: Project overview and instructions.\n\n## Tools Used\n- **Python Libraries**:\n  - `numpy` for data manipulation\n  - `pandas` for data analysis\n  - `matplotlib` and `seaborn` for visualizations\n\n## Future Improvements\n- Enhance visualizations by adding interactive plots using `plotly` or `dash`.\n- Perform advanced statistical analysis to uncover deeper insights.\n- Automate the preprocessing and analysis steps for scalability.\n\n## License\nThis project is licensed under the [MIT License](LICENSE).\n\n---\n\nIf you have any questions or feedback, feel free to raise an issue or reach out. Thank you for exploring this project!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fathul64%2Fexploratory-data-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fathul64%2Fexploratory-data-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fathul64%2Fexploratory-data-analysis/lists"}