Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/travelxml/pytorch-diabetes-prediction-ann-nlp

PyTorch-based diabetes prediction model using Artificial Neural Networks (ANN) and Natural Language Processing (NLP). Includes data preprocessing, model training, and evaluation scripts.
https://github.com/travelxml/pytorch-diabetes-prediction-ann-nlp

ann nlp nlp-library nlp-machine-learning pytorch pytorch-implementation

Last synced: about 5 hours ago
JSON representation

PyTorch-based diabetes prediction model using Artificial Neural Networks (ANN) and Natural Language Processing (NLP). Includes data preprocessing, model training, and evaluation scripts.

Host: GitHub
URL: https://github.com/travelxml/pytorch-diabetes-prediction-ann-nlp
Owner: TravelXML
Created: 2024-08-23T12:46:27.000Z (about 1 month ago)
Default Branch: master
Last Pushed: 2024-08-23T13:11:19.000Z (about 1 month ago)
Last Synced: 2024-09-26T20:03:34.796Z (about 5 hours ago)
Topics: ann, nlp, nlp-library, nlp-machine-learning, pytorch, pytorch-implementation
Language: Jupyter Notebook
Homepage:
Size: 1.43 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Creating ANN with PyTorch on Pima Diabetes Dataset

This repository showcases the creation of an Artificial Neural Network (ANN) using PyTorch to predict diabetes based on the Pima Indian Diabetes Dataset. The project covers all essential steps, from data preprocessing to model evaluation, and also provides a Docker setup for easy replication.

## Table of Contents

1. [Project Overview](#project-overview)

2. [Project Structure](#project-structure)

3. [Getting Started](#getting-started)

   - [Prerequisites](#prerequisites)

   - [Installation](#installation)

4. [Running with Docker](#running-with-docker)

   - [Build the Docker Image](#1-build-the-docker-image)

   - [Run the Docker Container](#2-run-the-docker-container)

   - [Access Jupyter Notebook](#3-access-jupyter-notebook)

5. [Key Steps in the Project](#key-steps-in-the-project)

   - [Data Preprocessing](#1-data-preprocessing)

   - [Feature and Target Separation](#2-feature-and-target-separation)

   - [Data Splitting](#3-data-splitting)

   - [Tensor Conversion](#4-tensor-conversion)

   - [Model Definition](#5-model-definition)

   - [Model Instantiation](#6-model-instantiation)

   - [Loss Function and Optimizer](#7-loss-function-and-optimizer)

   - [Model Training](#8-model-training)

   - [Model Evaluation Mode](#9-model-evaluation-mode)

   - [Predictions](#10-predictions)

   - [Confusion Matrix](#11-confusion-matrix)

   - [Confusion Matrix Visualization](#12-confusion-matrix-visualization)

   - [Accuracy Calculation](#13-accuracy-calculation)

6. [Results](#results)

## Project Overview

- **Data Preprocessing:** Loaded the dataset, handled missing values, and labeled the target variable.

- **Modeling:** Built a custom neural network with PyTorch, trained it, and evaluated its performance.

- **Visualization:** Visualized the results using a confusion matrix heatmap.

## Project Structure

- **data/**: Contains the dataset used in the project.

- **notebooks/**: Jupyter notebooks with the code and explanations.

- **requirements.txt**: Python dependencies required to run the project.

- **README.md**: Overview of the project (this file).

## Getting Started

### Prerequisites

Make sure you have Python 3.x and `pip` installed on your system.

### Installation

1. Clone the repository:

    ```bash

    git clone https://github.com/your-username/PYTORCH-DIABETES-PREDICTION.git

    cd PYTORCH-DIABETES-PREDICTION

    ```

2. Install the required Python packages:

    ```bash

    pip install -r requirements.txt

    ```

3. Run the Jupyter notebook:

    ```bash

    jupyter notebook notebooks/Creating_ANN_with_PyTorch.ipynb

    ```

## Running with Docker

### 1. Build the Docker Image

Build the Docker image using the Dockerfile:

```bash

docker build -t pytorch-jupyter .

```

### 2. Run the Docker Container

Run a container from the image you just built:

```bash

docker run -it --name pytorch-jupyter-container -p 8888:8888 pytorch-jupyter

```

### 3. Access Jupyter Notebook

After running the container, you will see a URL in the terminal with a token, something like this:

```bash

http://127.0.0.1:8888/?token=

```

## Key Steps in the Project

### 1. Data Preprocessing

Loaded the dataset, checked for missing values, and replaced the numeric target variable with descriptive labels ("Diabetic" and "No Diabetic").

```python

import pandas as pd

# Load the dataset

df = pd.read_csv('data/diabetes.csv')

# Check for missing values

print(df.isnull().sum())

# Replace numeric target with descriptive labels

df['Outcome'] = df['Outcome'].replace({1: "Diabetic", 0: "No Diabetic"})

```

### 2. Feature and Target Separation

Split the DataFrame into independent features (`X`) and the target variable (`y`).

```python

# Separate features and target variable

X = df.drop('Outcome', axis=1).values

y = df['Outcome'].values

```

### 3. Data Splitting

Divided the dataset into training and testing sets using an 80-20 split.

```python

from sklearn.model_selection import train_test_split

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

```

### 4. Tensor Conversion

Converted the training and testing data into PyTorch tensors for model training.

```python

import torch

# Convert data to PyTorch tensors

X_train = torch.FloatTensor(X_train)

X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)

y_test = torch.LongTensor(y_test)

```

### 5. Model Definition

Created a custom neural network model class (`ANN_Model`) with two hidden layers using PyTorch.

```python

import torch.nn as nn

import torch.nn.functional as F

# Define the neural network model

class ANN_Model(nn.Module):

    def __init__(self, input_features=8, hidden1=20, hidden2=20, out_features=2):

        super().__init__()

        self.f_connected1 = nn.Linear(input_features, hidden1)

        self.f_connected2 = nn.Linear(hidden1, hidden2)

        self.out = nn.Linear(hidden2, out_features)

    

    def forward(self, x):

        x = F.relu(self.f_connected1(x))

        x = F.relu(self.f_connected2(x))

        x = self.out(x)

        return x

```

### 6. Model Instantiation

Initialized the model and set the random seed for reproducibility.

```python

# Set the random seed for reproducibility

torch.manual_seed(20)

# Instantiate the model

model = ANN_Model()

```

### 7. Loss Function and Optimizer

Defined the loss function (`CrossEntropyLoss`) and the optimizer (`Adam`) to guide the model training.

```python

# Define the loss function and optimizer

loss_function = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

```

### 8. Model Training

Trained the model over 500 epochs, recording the loss at each epoch and updating the model's parameters using backpropagation.

```python

# Train the model

epochs = 500

final_losses = []

for i in range(epochs):

    i = i + 1

    y_pred = model.forward(X_train)

    loss = loss_function(y_pred, y_train)

    final_losses.append(loss)

    if i % 10 == 1:

        print(f"Epoch number: {i} and the loss: {loss.item()}")

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

```

### 9. Model Evaluation Mode

Set the model to evaluation mode to ensure correct behavior during testing.

```python

# Set the model to evaluation mode

model.eval()

```

### 10. Predictions

Made predictions on the test data and stored them.

```python

# Make predictions on the test data

predictions = []

with torch.no_grad():

    for i, data in enumerate(X_test):

        y_pred = model(data)

        predictions.append(y_pred.argmax().item())

```

### 11. Confusion Matrix

Calculated the confusion matrix to evaluate the performance of the model's predictions.

```python

from sklearn.metrics import confusion_matrix

# Calculate the confusion matrix

cm = confusion_matrix(y_test, predictions)

print(cm)

```

### 12. Confusion Matrix Visualization

Visualized the confusion matrix using a heatmap to better understand prediction accuracy.

```python

import seaborn as sns

import matplotlib.pyplot as plt

# Visualize the confusion matrix

plt.figure(figsize=(10,6))

sns.heatmap(cm, annot=True, fmt="d")

plt.xlabel('Actual Values')

plt.ylabel('Predicted Values')

plt.show()

```

### 13. Accuracy Calculation

Computed the accuracy of the model's predictions on the test data.

```python

from sklearn.metrics import accuracy_score

# Calculate the accuracy score

score = accuracy_score(y_test, predictions)

print(f'Accuracy: {score}')

```

## Results

![image](https://github.com/user-attachments/assets/1aee5981-f78a-47f3-b96b-496e06204f10)

![image](https://github.com/user-attachments/assets/c96fb101-4bc0-44ab-b8f9-c042b473b576)

![image](https://github.com/user-attachments/assets/f0c505fa-32b3-4dfd-9304-1b6e31736b4c)