https://github.com/sridharyadav07/code_alpha-_task-4

This project demonstrates the typical steps in a machine learning pipeline from data preprocessing and cleaning to training a model and evaluation its performance. The use of Random Forest is appropriate here given the complexity of the dataset, and with further tuning and improvements, this model could be used to make accurate predictions on heart
https://github.com/sridharyadav07/code_alpha-_task-4

confusion-matrix jypyternotebook labelencoder matplotlib numpy pandas python random-forest-classifier scikit-learn

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/sridharyadav07/code_alpha-_task-4
Owner: SridharYadav07
Created: 2025-02-24T07:27:46.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-02-24T08:10:59.000Z (8 months ago)
Last Synced: 2025-02-24T08:35:02.587Z (8 months ago)
Topics: confusion-matrix, jypyternotebook, labelencoder, matplotlib, numpy, pandas, python, random-forest-classifier, scikit-learn
Language: Jupyter Notebook
Homepage:
Size: 36.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Code_Alpha-_Task-4

Data Import and Exploration:
The necessary libraries for data manipulation, machine learning, and visualization are imported.
The dataset is loaded and checked for the top 5 rows, columns, data types, and missing values.

Data Preprocessing:
Handling Missing Values: Numerical columns are imputed using the mean of the respective column.
Categorical columns are imputed using the most frequent value of the respective column.

Label Encoding:
Binary categorical columns such as sex, fbs, and exang are label encoded into 0 and 1.

One Hot Encoding:
Columns with more than two categories are one hot encoded to create binary columns for each category.
The id, num, and dataset columns are dropped, and the target variable is separated from the features.

Data Splitting and Scaling:
The data is split into training and testing sets (80% training, 20% testing)
Feature scaling is applied using standardscaler to standardize the feature values so that they have a mean of 0 and a standard deviataion of 1.

Model Training and Evaluation:
A Random Forest classifier is trained on the training set.
Predictions are made on the test set, and the accuracy of the model is calcualted.
The classification report is printed, which includes metrics like precision, recall, f1-score and support.
A confusion matrix is generated to show the true vs predicted values. The confusion matrix is visualized using a heatmap for easier interpretaion.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sridharyadav07/code_alpha-_task-4

Awesome Lists containing this project

README