https://github.com/abshar-shihab/diabetes_prediction_using_machine_learning
The Diabetes Prediction Project uses a machine learning approach to predict whether a person is diabetic or not based on key health-related metrics. The project explores data preprocessing, feature scaling, and classification techniques using Python libraries.
https://github.com/abshar-shihab/diabetes_prediction_using_machine_learning
machine-learning ml prediction predictive-modeling svm-classifier univariate-bayesian
Last synced: 4 months ago
JSON representation
The Diabetes Prediction Project uses a machine learning approach to predict whether a person is diabetic or not based on key health-related metrics. The project explores data preprocessing, feature scaling, and classification techniques using Python libraries.
- Host: GitHub
- URL: https://github.com/abshar-shihab/diabetes_prediction_using_machine_learning
- Owner: Abshar-Shihab
- Created: 2024-11-21T20:01:50.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-11-21T20:14:43.000Z (8 months ago)
- Last Synced: 2025-01-24T15:27:59.732Z (6 months ago)
- Topics: machine-learning, ml, prediction, predictive-modeling, svm-classifier, univariate-bayesian
- Language: Jupyter Notebook
- Homepage:
- Size: 70.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Support: Support Vector Machine (SVM) Model/Diabetes_prediction_using_SVM_classifier.ipynb
Awesome Lists containing this project
README
# Diabetes_prediction_using_Machine_Learning
The Diabetes Prediction Project uses a machine learning approach to predict whether a person is diabetic or not based on key health-related metrics. The project explores data preprocessing, feature scaling, and classification techniques using Python libraries.# Diabetes Prediction Project
This project uses machine learning techniques to predict whether a person is diabetic based on health-related parameters. The workflow includes data preprocessing, feature scaling, training, and evaluation of a classification model.
## Overview
Diabetes is a chronic illness that requires early detection for effective management. This project employs machine learning to analyze and classify individuals as diabetic or non-diabetic using a structured dataset. The pipeline is designed to process input data, train a classification model, and evaluate its performance.
## Dataset
The project uses the **PIMA Indian Diabetes Dataset**, which contains medical data for patients. The features include:
- `Pregnancies`: Number of times pregnant
- `Glucose`: Plasma glucose concentration
- `BloodPressure`: Diastolic blood pressure (mm Hg)
- `SkinThickness`: Triceps skinfold thickness (mm)
- `Insulin`: 2-hour serum insulin (mu U/ml)
- `BMI`: Body Mass Index (weight in kg/(height in m)^2)
- `DiabetesPedigreeFunction`: Diabetes likelihood based on family history
- `Age`: Patient's age
- `Outcome`: Target variable (0 = Non-diabetic, 1 = Diabetic)The dataset is small and relatively clean, making it suitable for beginners in machine learning.
## Prerequisites
The following Python libraries are required:
- `numpy`
- `pandas`
- `scikit-learn`Install them using:
```bash
pip install numpy pandas scikit-learn
```## Workflow
1. **Load Dataset**:
- The dataset is loaded using `pandas`.2. **Explore Dataset**:
- Examine structure, summary statistics, and value distributions.3. **Preprocessing**:
- Standardize features for uniform scale using `StandardScaler`.4. **Split Data**:
- Partition the data into training and testing sets using `train_test_split`.5. **Model Training**:
- Train a machine learning classifier on the training data.6. **Model Evaluation**:
- Evaluate accuracy on both training and test sets.7. **Make Predictions**:
- Use the trained model to predict the diabetes status for new inputs.## Results
- **Training Accuracy**: ~X%
- **Testing Accuracy**: ~Y%## How to Run the Project
1. Clone or download the project files.
2. Place the `diabetes.csv` dataset in the same directory as the script.
3. Run the Python script using:
```bash
python diabetes_prediction.py
```
4. Test the model by providing custom input data.## Future Enhancements
- Use additional classifiers like Random Forest, Gradient Boosting, or Neural Networks.
- Perform hyperparameter tuning for improved accuracy.
- Add data visualization to enhance exploratory analysis.## Acknowledgments
- Dataset from the UCI Machine Learning Repository.
- Scikit-learn for providing machine learning utilities.
```