https://github.com/mikekeith52/randomforest

Using Random Forest to predict the presence of heart disease
https://github.com/mikekeith52/randomforest

Last synced: 9 months ago
JSON representation

Using Random Forest to predict the presence of heart disease

Host: GitHub
URL: https://github.com/mikekeith52/randomforest
Owner: mikekeith52
License: mit
Created: 2021-09-06T22:18:59.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-09-07T18:26:01.000Z (over 4 years ago)
Last Synced: 2025-02-09T08:17:13.933Z (10 months ago)
Language: Jupyter Notebook
Size: 552 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Random Forest
Using Random Forest to predict the presence of heart disease. See [RandomForestClassifier documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) from scikit-learn.

## About
With Random Forest, it is important to avoid overfitting and maximize out-of-sample accuracy by optimizing the model's hyperparameters. The following parameters are optimized in this example:
- `n_estimators`
- `max_depth`
- `max_features`

The following optimization techniques are used:
- OOB error reduction
- grid search with 10-folds cross validation

The following feature importance techniques are used:
- Gini impurity
- Permutation feature importance

## Installation
- Download [Anaconda](https://www.anaconda.com/)
- Download source code and install [requirements](requirements.txt)

## Data
The heart disease dataset from [Kaggle](https://www.kaggle.com/ronitf/heart-disease-uci) is utilized

## Results
Using the following parameters:
- `n_estimators = 90`
- `max_depth = 2`
- `max_featres = 'sqrt'`

An in-sample F1 score of 88.81% and an out-of-sample F1 score of 88% are obtained. With any machine learning model, ideally you hope to obtain an in-sample accuracy a bit better than an out-of-sample accuracy. That way, you know your model is neither over nor underfit. Use these criteria, the modeling project was a success.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mikekeith52/randomforest

Awesome Lists containing this project

README