Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abhi18av/nextflow-datascience-titanic-survival-analysis
https://github.com/abhi18av/nextflow-datascience-titanic-survival-analysis
data-science nextflow titanic-survival-prediction
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/abhi18av/nextflow-datascience-titanic-survival-analysis
- Owner: abhi18av
- License: mit
- Created: 2020-11-20T09:51:58.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-27T13:11:18.000Z (about 4 years ago)
- Last Synced: 2023-03-05T19:04:59.112Z (almost 2 years ago)
- Topics: data-science, nextflow, titanic-survival-prediction
- Language: Jupyter Notebook
- Homepage:
- Size: 596 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Nextflow driven analysis of titanic_dataset
==============================This project demonstrates how a data-science pipeline can be scaled from a single machine to thousands of machines gracefully. The only thing that needs to be done is
to wrap the code from Jupyter notebook with Nextflow processes.### Project structure
```
main.nf
|
modules/
└── data
├── test_train_split
└── visualization
├── gender_survival_plots
└── survival_plots
|
└── features
├── derive_features
├── process_age
├── process_family
├── process_fare
├── process_nan
└── replace_features
└── models
├── linear_svm
└── grid_svm
workflows/
└── generate_plots
└── feature_engineering
└── train_models```
## Getting Started
Let's execute the main analysis locally with local data.
```
nextflow run main.nf -entry MAIN -params-file test_params.yml
```### Overall workflow
The following diagram represents the entire workflow.
![Complete workflow](./workflow.png)
#### Visualization workflow
![Visualization workflow](./docs/visualization_workflow.png)
#### Feature engineering workflow
![Feature Engineering workflow](./docs/feature_engineering_workflow.png)
#### Model training workflow
![Model Training workflow](./docs/model_training_workflow.png)
The foundation for this work is based on [cookiecutter data science template](https://github.com/drivendata/cookiecutter-data-science) and the
[Titanic dataset analysis](https://www.kaggle.com/ash316/eda-to-prediction-dietanic)