Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/johnnixon6972/cirrhosis-outcomes-prediction
This leverages advanced machine learning techniques to predict patient outcomes for those suffering from cirrhosis. Utilizing a comprehensive dataset from a Mayo Clinic study, this project explores various data imputation methods and class balancing techniques to enhance prediction accuracy.
https://github.com/johnnixon6972/cirrhosis-outcomes-prediction
ai algorithms analytics artificial-intelligence machine-learning ml pandas python3 scikit-learn
Last synced: about 2 months ago
JSON representation
This leverages advanced machine learning techniques to predict patient outcomes for those suffering from cirrhosis. Utilizing a comprehensive dataset from a Mayo Clinic study, this project explores various data imputation methods and class balancing techniques to enhance prediction accuracy.
- Host: GitHub
- URL: https://github.com/johnnixon6972/cirrhosis-outcomes-prediction
- Owner: JohnNixon6972
- Created: 2024-05-09T20:18:50.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-06-13T10:34:42.000Z (7 months ago)
- Last Synced: 2024-11-17T05:13:11.423Z (about 2 months ago)
- Topics: ai, algorithms, analytics, artificial-intelligence, machine-learning, ml, pandas, python3, scikit-learn
- Language: HTML
- Homepage:
- Size: 2.49 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cirrhosis Patient Outcomes Prediction
## Overview
This project employs machine learning methods to predict the outcomes of cirrhosis patients using publicly available data from a Mayo Clinic study. The outcomes are classified into three categories: survival, survival with a liver transplant, and death. Various imputation methods for handling missing data and techniques for balancing the classes are explored. The performance of different machine learning models is evaluated, with the Random Forest model emerging as the best performer.
## Table of Contents
- [Background](#background)
- [Data](#data)
- [Imputation Methods](#imputation-methods)
- [Handling Imbalanced Classes](#handling-imbalanced-classes)
- [Model Selection and Tuning](#model-selection-and-tuning)
- [Results](#results)
- [Usage](#usage)
- [Contributors](#contributors)
- [References](#references)## Background
The task is to predict the outcomes of cirrhosis patients from a study conducted by the Mayo Clinic between 1974 and 1984. The aim is to develop a machine learning model to assist medical professionals in directing care and understanding the associations between clinical features and patient outcomes.
## Data
The dataset comprises clinically relevant features recorded for each patient, including:
- Drug
- Age
- Sex
- Ascites
- Hepatomegaly
- Spiders
- Edema
- Bilirubin
- Cholesterol
- Copper
- Alkaline Phosphate (Alk Phos)
- SGOT
- Triglycerides
- Platelets
- Prothrombin
- StageThese features are used to predict patient outcomes classified as:
1. Survival (C)
2. Survival with a liver transplant (CL)
3. Death (D)## Imputation Methods
Given the likelihood of missing values in clinical settings, three imputation methods are compared:
1. **Simple Imputation:** Replaces missing values with the mean of the relevant feature.
2. **K-Nearest Neighbors (KNN) Imputation:** Replaces missing values based on the mean values of the nearest neighbors.
3. **Iterative Imputation:** Sequentially models each feature on all others and derives new values from model predictions.## Handling Imbalanced Classes
Due to the imbalance in the dataset (62.8% C, 3.5% CL, 33.7% D), over-sampling and under-sampling methods are used:
- **Random Over-Sampling:** Adds instances to the minority class.
- **Random Under-Sampling:** Removes instances from the majority class.
- **SMOTE (Synthetic Minority Over-sampling Technique):** Creates new instances based on nearest neighbors.
- **ENN (Edited Nearest Neighbors):** Removes observations with differing nearest neighbors.## Model Selection and Tuning
Several models were evaluated, including Support Vector Machine (SVM), Decision Tree, and Random Forest. The models were assessed using log loss as the primary performance metric. The Random Forest model, with specific tuning parameters, outperformed others, achieving the best results.
## Results
The Random Forest model achieved:
- Log Loss: 0.52
- Accuracy: 81%
- Precision, Recall, and F1-Score for each class are detailed in the report.Despite the overall good performance, challenges remain in correctly predicting the CL class.
## Usage
To use this project:
1. Clone the repository: `git clone https://github.com/your-username/cirrhosis-prediction.git`
2. Install the required libraries: `pip install -r requirements.txt`
3. Run the data preprocessing and model training scripts: `python preprocess.py` and `python train_model.py`## References
1. Mayo Clinic Study on Primary Biliary Cirrhosis
2. scikit-learn library for imputation methods
3. imbalanced-learn library for handling class imbalance
4. Optuna for hyperparameter tuningFor more detailed information, please refer to the project report.
---