Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pyscaffold/pyscaffoldext-dsproject
💫 PyScaffold extension for data-science projects
https://github.com/pyscaffold/pyscaffoldext-dsproject
data-science pyscaffold pyscaffold-extension python
Last synced: 3 days ago
JSON representation
💫 PyScaffold extension for data-science projects
- Host: GitHub
- URL: https://github.com/pyscaffold/pyscaffoldext-dsproject
- Owner: pyscaffold
- License: mit
- Created: 2019-07-01T15:53:30.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-02-03T17:14:40.000Z (15 days ago)
- Last Synced: 2025-02-09T09:05:56.392Z (10 days ago)
- Topics: data-science, pyscaffold, pyscaffold-extension, python
- Language: Python
- Homepage: https://pyscaffold.org/projects/dsproject
- Size: 138 KB
- Stars: 156
- Watchers: 7
- Forks: 23
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.rst
- Contributing: CONTRIBUTING.rst
- Funding: .github/FUNDING.yml
- License: LICENSE.txt
- Authors: AUTHORS.rst
Awesome Lists containing this project
README
[](https://cirrus-ci.com/github/pyscaffold/pyscaffoldext-dsproject)
[](https://pyscaffold.org/projects/dsproject/en/latest)
[](https://coveralls.io/r/pyscaffold/pyscaffoldext-dsproject)
[](https://pypi.org/project/pyscaffoldext-dsproject)
[](https://anaconda.org/conda-forge/pyscaffoldext-dsproject)
[](https://pepy.tech/project/pyscaffoldext-dsproject)
[](https://github.com/sponsors/FlorianWilhelm)# pyscaffoldext-dsproject
[PyScaffold] extension tailored for *Data Science* projects. This extension is inspired by
[cookiecutter-data-science] and enhanced in many ways. The main differences are that it
1. advocates a proper Python package structure that can be shipped and distributed,
2. uses a [conda] environment instead of something [virtualenv]-based and is thus more suitable
for data science projects,
3. more default configurations for [Sphinx], [pytest], [pre-commit], etc. to foster
clean coding and best practices.Also consider using [dvc] to version control and share your data within your team.
Read [this blogpost] to learn how to work with JupyterLab notebooks efficiently by using a
data science project structure like this.The final directory structure looks like:
```
├── AUTHORS.md <- List of developers and maintainers.
├── CHANGELOG.md <- Changelog to keep track of new features and fixes.
├── CONTRIBUTING.md <- Guidelines for contributing to this project.
├── Dockerfile <- Build a docker container with `docker build .`.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── configs <- Directory for configurations of model & application.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── environment.yml <- The conda environment file for reproducibility.
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .`
│ to install for development or to build `tox -e build`.
├── references <- Data dictionaries, manuals, and all other materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated plots and figures for reports.
├── scripts <- Analysis and production scripts which import the
│ actual PYTHON_PKG, e.g. train_model.
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for
│ development or `python setup.py bdist_wheel` to build.
├── src
│ └── PYTHON_PKG <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `pytest`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
```See a demonstration of the initial project structure under [dsproject-demo] and also check out
the documentation of [PyScaffold] for more information.## Usage
Just install this package with `conda install -c conda-forge pyscaffoldext-dsproject`
and note that `putup -h` shows a new option `--dsproject`.
Creating a data science project is then as easy as:
```bash
putup --dsproject my_ds_project
```The flag `--dsproject` comprises additionally the flags `--markdown`, `--pre-commit` and `--no-skeleton`
for convenience.## Making Changes & Contributing
This project uses [pre-commit], please make sure to install it before making any
changes:```bash
conda install pre-commit
cd pyscaffoldext-dsproject
pre-commit install
```It is a good idea to update the hooks to the latest version:
```bash
pre-commit autoupdate
```Please also check PyScaffold's [contribution guidelines].
[PyScaffold]: https://pyscaffold.org/
[cookiecutter-data-science]: https://github.com/drivendata/cookiecutter-data-science
[Miniconda]: https://docs.conda.io/en/latest/miniconda.html
[Jupyter]: https://jupyter.org/
[dsproject-demo]: https://github.com/pyscaffold/dsproject-demo
[Sphinx]: https://www.sphinx-doc.org/
[pytest]: https://docs.pytest.org/
[conda]: https://docs.conda.io/
[Conda-Forge]: https://anaconda.org/conda-forge/pyscaffoldext-dsproject
[virtualenv]: https://virtualenv.pypa.io/
[pre-commit]: https://pre-commit.com/
[dvc]: https://dvc.org/
[this blogpost]: https://florianwilhelm.info/2018/11/working_efficiently_with_jupyter_lab/
[pre-commit]: https://pre-commit.com/
[contribution guidelines]: https://pyscaffold.org/en/latest/contributing.html