Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/andrew2077/titanic-julia

Deployment of AI model done with Julia on titanic dataset.
https://github.com/andrew2077/titanic-julia

Last synced: about 5 hours ago
JSON representation

Deployment of AI model done with Julia on titanic dataset.

Host: GitHub
URL: https://github.com/andrew2077/titanic-julia
Owner: Andrew2077
Created: 2023-08-09T13:51:38.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-08-09T13:56:40.000Z (over 1 year ago)
Last Synced: 2023-08-09T15:22:14.527Z (over 1 year ago)
Language: Jupyter Notebook
Size: 1.78 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Titanic - Machine Learning from Disaster

 Deployment of AI model done with Julia on titanic dataset.

## Table of Contents

- [Titanic - Machine Learning from Disaster](#titanic---machine-learning-from-disaster)

  - [Table of Contents](#table-of-contents)

  - [Introduction](#introduction)

    - [Librarries Used](#librarries-used)

  - [Dataset](#dataset)

  - [Preprocessing](#preprocessing)

  - [Insights](#insights)

  - [Modeling](#modeling)

  - [Deployment](#deployment)

  - [Results](#results)

## Introduction

Embarked on an educational project to explore Julia for ML, utilizing Titanic dataset for preprocessing, modeling, and survival prediction.

### Librarries Used 

- `DataFrames`: Used for handling and manipulating tabular data effectively.

- `CSV`: Employed for reading and writing CSV files, a common data format.

- `Plots`: Utilized for creating informative and insightful visualizations.

- `DecisionTree`: Employed for building Random Forests machine learning model.

## Dataset

Kaggle's Titanic ML competition introduces ML beginners to predict passenger survival 

[Titanic-dataset](https://www.kaggle.com/competitions/titanic)

## Preprocessing

- Managing missing data

- Removing unnecessary features

- Addressing categorical data

## Insights

- Death vs survival rate 

![](msc/survived_not_survived.png)

- Death by Class

![](msc/death_by_class.png)

- Suvival rate by gender

  

![](msc/survive_by_gender.png)

## Modeling

Subsequent to data cleansing, a RandomForestClassifer was employed to construct a survival prediction model,

RF hyper parameters

```julia

## set of classification parameters and respective default values

# n_subfeatures: #*number of features to consider at random per split (default: -1, sqrt(# features))

n_subfeatures = -1

# n_trees: #*number of trees to train (default: 10)

n_trees = 50

# partial_sampling: #* fraction of samples to train each tree on (default: 0.7)

partial_sampling = 0.7

# max_depth: #* maximum depth of the decision trees (default: no maximum)

max_depth = -1

# min_samples_leaf: #* the minimum number of samples each leaf needs to have (default: 5)

min_samples_leaf = 12

# min_samples_split: #* the minimum number of samples in needed for a split (default: 2)

min_samples_leaf = 7

# min_purity_increase: #* minimum purity needed for a split (default: 0.0)

min_samples_split = 3

min_purity_increase = 0.0

# keyword rng: #* the random number generator or seed to use (default Random.GLOBAL_RNG)

seed = 3

## multi-threaded forests must be seeded with an `Int`

```

- Model achieved accuracy of `86.5%` on training data

- model achieved accuracy of `89.7%` on test data

## Deployment

Deployed the Created model into a web application:

`JLD2` that saves and loades new model.

`HTTP` Connects Julia script with a frontend.

`JSON3` To parse the inputs

## Results 

Supporting the insights

- Death occured more among females

- Death occured more among 3d class passengers

- Males had more chance of Survival 

  

sample 1 - Female Death 

![](msc/female1.png)

sample 2 - Male Death

![](msc/male1.png)

sample 3 - male Survival 

![](msc/male2.png)