https://github.com/vickshan001/diabetes-prediction-using-machine-learning-ci512-project-
Machine learning project (2021) predicting diabetes using the Pima Indians dataset. Compared KNN, Decision Tree, MLP, and more for accuracy.
https://github.com/vickshan001/diabetes-prediction-using-machine-learning-ci512-project-
classification diabetes-prediction machine-learning mlp pima-indians-dataset python scikit-learn
Last synced: 6 months ago
JSON representation
Machine learning project (2021) predicting diabetes using the Pima Indians dataset. Compared KNN, Decision Tree, MLP, and more for accuracy.
- Host: GitHub
- URL: https://github.com/vickshan001/diabetes-prediction-using-machine-learning-ci512-project-
- Owner: vickshan001
- Created: 2025-03-29T18:50:38.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-03-29T18:51:50.000Z (6 months ago)
- Last Synced: 2025-03-29T19:32:11.956Z (6 months ago)
- Topics: classification, diabetes-prediction, machine-learning, mlp, pima-indians-dataset, python, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ง Diabetes Prediction Using Machine Learning (CI512 Project)
A machine learning project using the **Pima Indians Diabetes Dataset** to predict whether a patient has diabetes. Built in **2021** for the CI512 โ Intelligent Systems module, this project applies and compares multiple classification algorithms to identify the most accurate model.
---
## ๐ Dataset
- **Source**: National Institute of Diabetes and Digestive and Kidney Diseases
- **Population**: Female patients over 21 years old of Pima Indian heritage
- **Features**:
- Pregnancies
- Plasma glucose concentration
- Diastolic blood pressure
- Triceps skin fold thickness
- Insulin
- BMI
- Diabetes Pedigree Function
- Age
- Outcome (0 or 1)---
## ๐ค Algorithms Used
1. **K-Nearest Neighbors (KNN)**
- Predicts based on the distance to nearest data points.
- Chosen for its high accuracy with classification tasks.2. **Decision Tree Classifier**
- Easy-to-interpret model using rule-based branching.
- Good for explaining predictions.3. **Random Forest Classifier**
- Ensemble method combining multiple decision trees.
- Initially included, but later excluded due to inconsistent results.4. **Multilayer Perceptron (MLP)**
- Neural network model with hidden layers.
- Provided the best prediction accuracy on this dataset.5. **Stacking Classifier**
- Combines multiple models to improve prediction using meta-learning.---
## ๐งช Evaluation Method
- **Data Cleaning**: Replaced 0 values with feature-wise mean (to handle missing values).
- **Validation**: 10-fold cross-validation
- **Split**: 70% Training, 30% Testing
- **Visualization**: Line graphs for model accuracy comparison---
## ๐ Results
- **Random Forest** was excluded due to poor alignment with other models.
- **MLP (Multilayer Perceptron)** outperformed all other classifiers.
- **Stacking** showed promise by combining model strengths.---
## ๐ Technologies
- Python
- Pandas
- Scikit-learn
- Matplotlib
- NumPy---
## ๐จโ๐ป Developed By
**Vickshan Vicknakumaran**
University of Brighton
CI512 โ Intelligent Systems (2021)---