https://github.com/crmne/cookiecutter-modern-datascience

Start a data science project with modern tools
https://github.com/crmne/cookiecutter-modern-datascience

cookiecutter cookiecutter-data-science cookiecutter-template datascience python

Last synced: 3 months ago
JSON representation

Start a data science project with modern tools

Host: GitHub
URL: https://github.com/crmne/cookiecutter-modern-datascience
Owner: crmne
License: bsd-3-clause
Created: 2020-07-06T12:46:30.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-08-10T08:03:15.000Z (about 2 years ago)
Last Synced: 2024-02-14T21:27:11.792Z (over 1 year ago)
Topics: cookiecutter, cookiecutter-data-science, cookiecutter-template, datascience, python
Language: Python
Homepage:
Size: 99.6 KB
Stars: 164
Watchers: 4
Forks: 33
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Cookiecutter Modern Data Science
[Cookiecutter] template for starting a Data Science project with modern, fast Python tools.

## Features

* [Pipenv] for managing packages and virtualenvs in a modern way.
* [Prefect] for modern pipelines and data workflow.
* [Weights and Biases] for experiment tracking.
* [FastAPI] for self-documenting fast HTTP APIs - on par with NodeJS and Go - based on [asyncio], [ASGI], and [uvicorn].
* Modern CLI with [Typer].
* Batteries included: [Pandas], [numpy], [scipy], [seaborn], and [jupyterlab] already installed.
* Consistent code quality: [black], [isort], [autoflake], and [pylint] already installed.
* [Pytest] for testing.
* [GitHub Pages] for the public website.

## Quickstart

Install the latest Cookiecutter and Pipenv:

pip install -U pipenv cookiecutter

Generate the project:

cookiecutter gh:crmne/cookiecutter-modern-datascience

Get inside the project:

cd
pipenv shell # activates virtualenv

(Optional) Start Weights & Biases locally, if you don't want to use the cloud/on-premise version:

wandb local

Start working:

jupyter-lab

## Directory structure

This is our your new project will look like:

├── .gitignore
├── LICENSE
├── Pipfile
├── README.md
│
├── data
│ ├── 0_raw
│ ├── 0_external
│ ├── 1_interim
│ └── 2_final
│
├── docs
│ ├── data_dictionaries
│ └── references
│
├── notebooks
│
│
│
├── output
│ ├── features
│ ├── models
│ └── reports
│ └── figures
│
├── pipelines
│ ├── Pipfile
│ ├── pipelines.py
│ ├──
│ │ ├── __init__.py
│ │ ├── etl.py
│ │ ├── visualize.py
│ │ ├── features.py
│ │ └── train.py
│ └── tests
│ ├── fixtures
│ │
│ │
│ └──
│
└── serve
├── Dockerfile
├── Pipfile
├── app.py
└── tests
├── fixtures
│ ├── input.json
│ └── output.json
└── test_app.py <- GitHub's excellent Python .gitignore customized for this project <- Your project's license. <- The Pipfile for reproducing the analysis environment <- The top-level README for developers using this project. <- The original, immutable data dump. <- Data from third party sources. <- Intermediate data that has been transformed. <- The final, canonical data sets for modeling. <- GitHub pages website <- Data dictionaries <- Papers, manuals, and all other explanatory materials. <- Jupyter notebooks. Naming convention is a number (for ordering), the creator's initials, and a short `_` delimited description, e.g. `01_cp_exploratory_data_analysis.ipynb`. <- Fitted and serialized features <- Trained and serialized models, model predictions, or model summaries <- Generated analyses as HTML, PDF, LaTeX, etc. <- Generated graphics and figures to be used in reporting <- Pipelines and data workflows. <- The Pipfile for reproducing the pipelines environment <- The CLI entry point for all the pipelines <- Code for the various steps of the pipelines <- Download, generate, and process data <- Create exploratory and results oriented visualizations <- Turn raw data into features for modeling <- Train and evaluate models <- Where to put example inputs and outputs ├── input.json <- Test input data └── output.json <- Test output data test_pipelines.py <- Integration tests for the HTTP API <- HTTP API for serving predictions <- Dockerfile for HTTP API <- The Pipfile for reproducing the serving environment <- The entry point of the HTTP API <- Where to put example inputs and outputs <- Test input data <- Test output data <- Integration tests for the HTTP API

[Cookiecutter]: https://github.com/audreyr/cookiecutter
[Pipenv]: https://pipenv.pypa.io/en/latest/
[Prefect]: https://docs.prefect.io/
[Weights and Biases]: https://www.wandb.com/
[MLFlow]: https://mlflow.org/
[FastAPI]: https://fastapi.tiangolo.com/
[asyncio]: https://docs.python.org/3/library/asyncio.html
[ASGI]: https://asgi.readthedocs.io/en/latest/
[uvicorn]: https://www.uvicorn.org/
[Typer]: https://typer.tiangolo.com/
[Pandas]: https://pandas.pydata.org/
[numpy]: https://numpy.org/
[scipy]: https://www.scipy.org/
[seaborn]: https://seaborn.pydata.org/
[jupyterlab]: https://jupyterlab.readthedocs.io/en/stable/
[black]: https://github.com/psf/black
[isort]: https://github.com/timothycrosley/isort
[autoflake]: https://github.com/myint/autoflake
[pylint]: https://www.pylint.org/
[Pytest]: https://docs.pytest.org/en/latest/
[GitHub Pages]: https://pages.github.com/
[Git LFS]: https://git-lfs.github.com/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/crmne/cookiecutter-modern-datascience

Awesome Lists containing this project

README