Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tatamiya/kedro_titanic
Create an ML workflow with Kedro, using Titanic data as an example.
https://github.com/tatamiya/kedro_titanic
Last synced: about 1 month ago
JSON representation
Create an ML workflow with Kedro, using Titanic data as an example.
- Host: GitHub
- URL: https://github.com/tatamiya/kedro_titanic
- Owner: tatamiya
- Created: 2020-02-22T02:28:45.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-26T20:49:05.000Z (almost 2 years ago)
- Last Synced: 2023-03-05T19:47:59.306Z (almost 2 years ago)
- Language: Python
- Size: 43 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Titanic with Kedro
## Overview
This is your new Kedro project, which was generated using `Kedro 0.15.5` by running:
```
kedro new
```Take a look at the [documentation](https://kedro.readthedocs.io) to get started.
## ToDo
- Save preprocessing objects such as label encoder.
- Avoid leakage
- Speculation with new data
- Read out the saved model and predict from new data sets## Data Download
Download the Titanic data from Kaggle using Kaggle API.
```
kaggle competitions download -c titanic
unzip titanic.zip
```## Rules and guidelines
In order to get the best out of the template:
* Please don't remove any lines from the `.gitignore` file provided
* Make sure your results can be reproduced by following a data engineering convention, e.g. the one we suggest [here](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#what-is-data-engineering-convention)
* Don't commit any data to your repository
* Don't commit any credentials or local configuration to your repository
* Keep all credentials or local configuration in `conf/local/`## Installing dependencies
Dependencies should be declared in `src/requirements.txt` for pip installation and `src/environment.yml` for conda installation.
To install them, run:
```
kedro install
```## Running Kedro
You can run your Kedro project with:
```
kedro run
```## Testing Kedro
Have a look at the file `src/tests/test_run.py` for instructions on how to write your tests. You can run your tests with the following command:
```
kedro test
```To configure the coverage threshold, please have a look at the file `.coveragerc`.
### Working with Kedro from notebooks
In order to use notebooks in your Kedro project, you need to install Jupyter:
```
pip install jupyter
```For using Jupyter Lab, you need to install it:
```
pip install jupyterlab
```After installing Jupyter, you can start a local notebook server:
```
kedro jupyter notebook
```You can also start Jupyter Lab:
```
kedro jupyter lab
```And if you want to run an IPython session:
```
kedro ipython
```Running Jupyter or IPython this way provides the following variables in
scope: `proj_dir`, `proj_name`, `conf`, `io`, `parameters` and `startup_error`.#### Converting notebook cells to nodes in a Kedro project
Once you are happy with a notebook, you may want to move your code over into the Kedro project structure for the next stage in your development. This is done through a mixture of [cell tagging](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html#cell-tags) and Kedro CLI commands.
By adding the `node` tag to a cell and running the command below, the cell's source code will be copied over to a Python file within `src//nodes/`.
```
kedro jupyter convert
```
> *Note:* The name of the Python file matches the name of the original notebook.Alternatively, you may want to transform all your notebooks in one go. To this end, you can run the following command to convert all notebook files found in the project root directory and under any of its sub-folders.
```
kedro jupyter convert --all
```#### Ignoring notebook output cells in `git`
In order to automatically strip out all output cell contents before committing to `git`, you can run `kedro activate-nbstripout`. This will add a hook in `.git/config` which will run `nbstripout` before anything is committed to `git`.
> *Note:* Your output cells will be left intact locally.
## Package the project
In order to package the project's Python code in `.egg` and / or a `.wheel` file, you can run:
```
kedro package
```After running that, you can find the two packages in `src/dist/`.
## Building API documentation
To build API docs for your code using Sphinx, run:
```
kedro build-docs
```See your documentation by opening `docs/build/html/index.html`.
## Building the project requirements
To generate or update the dependency requirements for your project, run:
```
kedro build-reqs
```This will copy the contents of `src/requirements.txt` into a new file `src/requirements.in` which will be used as the source for `pip-compile`. You can see the output of the resolution by opening `src/requirements.txt`.
After this, if you'd like to update your project requirements, please update `src/requirements.in` and re-run `kedro build-reqs`.