https://github.com/abhinavsharma07/coder_roots
Coder_Roots Assignment
https://github.com/abhinavsharma07/coder_roots
Last synced: 2 months ago
JSON representation
Coder_Roots Assignment
- Host: GitHub
- URL: https://github.com/abhinavsharma07/coder_roots
- Owner: AbhinavSharma07
- Created: 2024-12-06T11:51:39.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-06T12:13:54.000Z (10 months ago)
- Last Synced: 2025-04-05T00:28:19.698Z (6 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 177 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Coder_Roots Assignment
This repository contains solutions to three tasks involving data manipulation, visualization, and predictive modeling. Each task demonstrates specific data science techniques using Python.
---
## Task 1: Data Manipulation and Cleaning
### Objective:
Clean and analyze the `employee_data.csv` dataset.### Dataset:
[Employee_Data](https://github.com/gurmindero7/test_datasets/blob/main/employee_data.csv)### Steps:
1. Remove duplicate entries.
2. Handle missing values (fill them with default values or drop the rows).
3. Convert the `JoiningDate` column to a proper datetime format.
4. Filter out employees where the `Status` is "Resigned".
5. Analyze the data:
- Find the average salary by department.
- List employees who joined after 2020.### Code:
Refer to the script `Task1_Data Manipulation and Cleaning.ipynb`.### Outputs:
- Cleaned DataFrame.
- Average salary per department.
- List of employees who joined after 2020.---
## Task 2: Data Visualization
### Objective:
Explore a public dataset through visualizations.### Dataset:
Any public dataset can be used (e.g., Titanic dataset).
[Titanic dataset](https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv)### Steps:
1. Load the dataset into a Pandas DataFrame.
2. Create four meaningful visualizations using Matplotlib or Seaborn.
- Examples: Bar plots, histograms, box plots, etc.
3. Generate a correlation heatmap for numerical variables.### Code:
Refer to the script `Task2_Data Visualization.ipynb`.### Outputs:
- Visualizations:
1. Survival rate by passenger class.
2. Histogram of passenger ages.
3. Box plot of fare distribution.
4. Correlation heatmap.---
## Task 3: Predictive Modeling (Classification)
### Objective:
Build a classification model to predict the presence of diabetes based on health metrics.### Dataset:
[Diabetes Prediction Dataset](https://github.com/gurmindero7/test_datasets/blob/main/diabetes_prediction_dataset.csv)### Steps:
1. **Data Preprocessing**:
- Replace missing or undefined values.
- Convert categorical variables to numerical using encoding techniques.
- Normalize or scale features if necessary.
2. **Model Building**:
- Split the dataset into training and testing sets.
- Train two classification models:
1. Logistic Regression
2. Decision Tree
3. **Model Evaluation**:
- Evaluate models using accuracy, precision, recall, and F1 score.### Code:
Refer to the script `Task3_Predictive Modeling.ipynb`.### Outputs:
- Preprocessed dataset.
- Performance metrics for Logistic Regression and Decision Tree models:
- Accuracy, Precision, Recall, F1 Score.---
## Installation and Usage
1. Clone the repository:
```bash
git clone https://github.com/AbhinavSharma07/Coder_Roots.git
cd Coder_Roots