Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andrew2077/titanic-julia
Deployment of AI model done with Julia on titanic dataset.
https://github.com/andrew2077/titanic-julia
Last synced: about 5 hours ago
JSON representation
Deployment of AI model done with Julia on titanic dataset.
- Host: GitHub
- URL: https://github.com/andrew2077/titanic-julia
- Owner: Andrew2077
- Created: 2023-08-09T13:51:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-09T13:56:40.000Z (over 1 year ago)
- Last Synced: 2023-08-09T15:22:14.527Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 1.78 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Titanic - Machine Learning from Disaster
Deployment of AI model done with Julia on titanic dataset.
## Table of Contents- [Titanic - Machine Learning from Disaster](#titanic---machine-learning-from-disaster)
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Librarries Used](#librarries-used)
- [Dataset](#dataset)
- [Preprocessing](#preprocessing)
- [Insights](#insights)
- [Modeling](#modeling)
- [Deployment](#deployment)
- [Results](#results)## Introduction
Embarked on an educational project to explore Julia for ML, utilizing Titanic dataset for preprocessing, modeling, and survival prediction.### Librarries Used
- `DataFrames`: Used for handling and manipulating tabular data effectively.
- `CSV`: Employed for reading and writing CSV files, a common data format.
- `Plots`: Utilized for creating informative and insightful visualizations.
- `DecisionTree`: Employed for building Random Forests machine learning model.## Dataset
Kaggle's Titanic ML competition introduces ML beginners to predict passenger survival[Titanic-dataset](https://www.kaggle.com/competitions/titanic)
## Preprocessing
- Managing missing data
- Removing unnecessary features
- Addressing categorical data## Insights
- Death vs survival rate![](msc/survived_not_survived.png)
- Death by Class
![](msc/death_by_class.png)
- Suvival rate by gender
![](msc/survive_by_gender.png)## Modeling
Subsequent to data cleansing, a RandomForestClassifer was employed to construct a survival prediction model,
RF hyper parameters
```julia
## set of classification parameters and respective default values
# n_subfeatures: #*number of features to consider at random per split (default: -1, sqrt(# features))
n_subfeatures = -1
# n_trees: #*number of trees to train (default: 10)
n_trees = 50
# partial_sampling: #* fraction of samples to train each tree on (default: 0.7)
partial_sampling = 0.7
# max_depth: #* maximum depth of the decision trees (default: no maximum)
max_depth = -1
# min_samples_leaf: #* the minimum number of samples each leaf needs to have (default: 5)
min_samples_leaf = 12
# min_samples_split: #* the minimum number of samples in needed for a split (default: 2)
min_samples_leaf = 7
# min_purity_increase: #* minimum purity needed for a split (default: 0.0)
min_samples_split = 3
min_purity_increase = 0.0
# keyword rng: #* the random number generator or seed to use (default Random.GLOBAL_RNG)
seed = 3
## multi-threaded forests must be seeded with an `Int````
- Model achieved accuracy of `86.5%` on training data
- model achieved accuracy of `89.7%` on test data## Deployment
Deployed the Created model into a web application:
`JLD2` that saves and loades new model.
`HTTP` Connects Julia script with a frontend.
`JSON3` To parse the inputs## Results
Supporting the insights
- Death occured more among females
- Death occured more among 3d class passengers
- Males had more chance of Survival
sample 1 - Female Death
![](msc/female1.png)
sample 2 - Male Death
![](msc/male1.png)
sample 3 - male Survival
![](msc/male2.png)