Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ploomber/projects
Sample projects using Ploomber.
https://github.com/ploomber/projects
Last synced: 29 days ago
JSON representation
Sample projects using Ploomber.
- Host: GitHub
- URL: https://github.com/ploomber/projects
- Owner: ploomber
- License: apache-2.0
- Created: 2020-05-05T21:58:11.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-01-25T00:58:58.000Z (11 months ago)
- Last Synced: 2024-08-09T02:17:40.979Z (4 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 10.8 MB
- Stars: 81
- Watchers: 8
- Forks: 25
- Open Issues: 13
-
Metadata Files:
- Readme: README.ipynb
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - ploomber/projects - Sample projects using Ploomber. (Jupyter Notebook)
README
{
"cells": [
{
"cell_type": "markdown",
"id": "6d898d50",
"metadata": {
"papermill": {
"duration": 0.00677,
"end_time": "2022-11-16T14:39:07.742657",
"exception": false,
"start_time": "2022-11-16T14:39:07.735887",
"status": "completed"
},
"tags": []
},
"source": [
"# Ploomber sample projects\n",
"\n",
"![CI](https://github.com/ploomber/projects/workflows/ci/badge.svg)\n",
"\n",
"\n",
\n",
" Join our community\n",
" |\n",
" Newsletter\n",
" |\n",
" Docs\n",
" |\n",
" Twitter\n",
" |\n",
" Blog\n",
" |\n",
" YouTube\n",
" |\n",
" Contact us\n",
"
"\n",
"This repository contains sample pipelines developed using [Ploomber](https://github.com/ploomber/ploomber).\n",
"\n",
"**Note:** We recommend you to go through the [first tutorial](https://docs.ploomber.io/en/latest/get-started/first-pipeline.html) to learn the basics of Ploomber.\n",
"\n",
"## Running examples"
]
},
{
"cell_type": "markdown",
"id": "b7826667",
"metadata": {
"papermill": {
"duration": 0.002241,
"end_time": "2022-11-16T14:39:07.747334",
"exception": false,
"start_time": "2022-11-16T14:39:07.745093",
"status": "completed"
},
"tags": []
},
"source": [
"Use Colab:\n",
"\n",
"\n",
"\n",
"\n",
"Or run locally:\n",
"\n",
"~~~sh\n",
"pip install ploomber\n",
"\n",
"# list examples\n",
"ploomber examples\n",
"\n",
"# download example with name\n",
"ploomber examples --name {name}\n",
"\n",
"# example\n",
"ploomber examples --name templates/mlflow\n",
"~~~\n",
"\n",
"## How to read the examples\n",
"\n",
"Each example contains a `README.md` file that describes it; a `README.ipynb` is also available with the same contents but in Jupyter notebook format and with command outputs. In addition, files for `pip` (`requirements.txt`) and `conda` (`environment.yml`) are provided for local execution.\n",
"\n",
"## Index\n",
"\n",
"### Templates\n",
"\n",
"Starting points for common use cases. Use them to ramp up a project quickly."
]
},
{
"cell_type": "markdown",
"id": "98eed9ff",
"metadata": {
"papermill": {
"duration": 0.002723,
"end_time": "2022-11-16T14:39:07.751775",
"exception": false,
"start_time": "2022-11-16T14:39:07.749052",
"status": "completed"
},
"tags": []
},
"source": [
"1. [`templates/etl`](templates/etl/README.ipynb) Download a data file, upload it to a database, process it, and plot with Python and R.\n",
"\n",
"2. [`templates/exploratory-analysis`](templates/exploratory-analysis/README.ipynb) Sample pipeline that explores penguins data.\n",
"\n",
"3. [`templates/google-cloud`](templates/google-cloud/README.ipynb) Use Google Cloud and Ploomber to develop a scalable and production-ready pipeline.\n",
"\n",
"4. [`templates/ml-advanced`](templates/ml-advanced/README.ipynb) ML pipeline using the Python API. Shows how to create a Python package, test it with pytest, and train models in parallel.\n",
"\n",
"5. [`templates/ml-basic`](templates/ml-basic/README.ipynb) Download data, clean it, generate features and train a model.\n",
"\n",
"6. [`templates/ml-intermediate`](templates/ml-intermediate/README.ipynb) Training and serving ML pipelines with integration testing to evaluate training data quality.\n",
"\n",
"7. [`templates/ml-online`](templates/ml-online/README.ipynb) Load data, generate features, train a model, and deploy model with flask.\n",
"\n",
"8. [`templates/mlflow`](templates/mlflow/README.ipynb) Train a grid of models and log them to MLflow.\n",
"\n",
"9. [`templates/python-api`](templates/python-api/README.ipynb) Loads, clean, and plot data using the Python API.\n",
"\n",
"10. [`templates/pytorch`](templates/pytorch/README.ipynb) Using GPUs to train models in Ploomber Cloud.\n",
"\n",
"11. [`templates/shell`](templates/shell/README.ipynb) Create a pipeline with shell scripts as tasks.\n",
"\n",
"12. [`templates/spec-api-directory`](templates/spec-api-directory/README.ipynb) Create a pipeline from a directory with scripts (without a pipeline.yaml file).\n",
"\n",
"13. [`templates/spec-api-r`](templates/spec-api-r/README.ipynb) Load, clean and plot data with R.\n",
"\n",
"14. [`templates/spec-api-sql`](templates/spec-api-sql/README.ipynb) Use SQL scripts to manipulate data in a database, dump a table, and plot it with Python."
]
},
{
"cell_type": "markdown",
"id": "3df2e459",
"metadata": {
"papermill": {
"duration": 0.001936,
"end_time": "2022-11-16T14:39:07.755901",
"exception": false,
"start_time": "2022-11-16T14:39:07.753965",
"status": "completed"
},
"tags": []
},
"source": [
"### Cookbook\n",
"\n",
"Short and to-the-point examples showing how to use a specific feature."
]
},
{
"cell_type": "markdown",
"id": "032ca243",
"metadata": {
"papermill": {
"duration": 0.001735,
"end_time": "2022-11-16T14:39:07.759411",
"exception": false,
"start_time": "2022-11-16T14:39:07.757676",
"status": "completed"
},
"tags": []
},
"source": [
"2. [`cookbook/dynamic-params`](cookbook/dynamic-params/README.ipynb) Pipeline parameters whose values are computed at runtime.\n",
"\n",
"3. [`cookbook/file-client`](cookbook/file-client/README.ipynb) Upload task's products upon execution (local, S3, GCloud storage)\n",
"\n",
"4. [`cookbook/grid`](cookbook/grid/README.ipynb) An example showing how to create a grid of tasks to train models with different parameters.\n",
"\n",
"5. [`cookbook/hooks`](cookbook/hooks/README.ipynb) Task hooks\n",
"\n",
"6. [`cookbook/incremental`](cookbook/incremental/README.ipynb) A pipeline that processes new records from a database and uploads them.\n",
"\n",
"7. [`cookbook/nested-cv`](cookbook/nested-cv/README.ipynb) Nested cross-validation for model selection and hyperparameter tuning.\n",
"\n",
"8. [`cookbook/python-load`](cookbook/python-load/README.ipynb) Load pipeline.yaml file in a Python session to customize initialization.\n",
"\n",
"9. [`cookbook/report-generation`](cookbook/report-generation/README.ipynb) Generating HTML/PDF reports.\n",
"\n",
"10. [`cookbook/serialization`](cookbook/serialization/README.ipynb) Shows how to use the serializer and unserializer decorators.\n",
"\n",
"11. [`cookbook/sql-dump`](cookbook/sql-dump/README.ipynb) A minimal example showing how to dump a table from a SQL database.\n",
"\n",
"13. [`cookbook/variable-number-of-products`](cookbook/variable-number-of-products/README.ipynb) Shows how to create tasks whose number of products depends on runtime conditions."
]
},
{
"cell_type": "markdown",
"id": "e65493ce",
"metadata": {
"papermill": {
"duration": 0.002152,
"end_time": "2022-11-16T14:39:07.763435",
"exception": false,
"start_time": "2022-11-16T14:39:07.761283",
"status": "completed"
},
"tags": []
},
"source": [
"### Guides\n",
"\n",
"In-depth tutorials for learning. These are part of the [documentation](https://docs.ploomber.io/en/latest/user-guide/index.html)."
]
},
{
"cell_type": "markdown",
"id": "dd47377d",
"metadata": {
"papermill": {
"duration": 0.002412,
"end_time": "2022-11-16T14:39:07.768693",
"exception": false,
"start_time": "2022-11-16T14:39:07.766281",
"status": "completed"
},
"tags": []
},
"source": [
"5. [`guides/cron`](guides/cron/README.ipynb) This guide shows how to schedule Ploomber pipelines using cron.\n",
"\n",
"6. [`guides/debugging`](guides/debugging/README.ipynb) Tutorial showing techniques for debugging pipelines.\n",
"\n",
"7. [`guides/first-pipeline`](guides/first-pipeline/README.ipynb) Introductory tutorial to learn the basics of Ploomber.\n",
"\n",
"8. [`guides/intro-to-ploomber`](guides/intro-to-ploomber/README.ipynb) Introductory tutorial to learn the basics of Ploomber.\n",
"\n",
"9. [`guides/logging`](guides/logging/README.ipynb) Tutorial showing how to add logging to a pipeline.\n",
"\n",
"10. [`guides/parametrized`](guides/parametrized/README.ipynb) Tutorial showing how to parametrize pipelines and change parameters from the command-line.\n",
"\n",
"11. [`guides/refactor`](guides/refactor/README.ipynb) Using Soorgeon to convert a notebook into a Ploomber pipeline.\n",
"\n",
"12. [`guides/serialization`](guides/serialization/README.ipynb) Tutorial explaining how the serializer and unserializer fields in a pipeline.yaml file work.\n",
"\n",
"13. [`guides/sql-templating`](guides/sql-templating/README.ipynb) Introductory tutorial teaching how to develop modular SQL pipelines.\n",
"\n",
"14. [`guides/testing`](guides/testing/README.ipynb) Tutorial showing how to use a task's on_finish hook to test data quality.\n",
"\n",
"15. [`guides/versioning`](guides/versioning/README.ipynb) A tutorial showing how to version pipeline products.\n"
]
},
{
"cell_type": "markdown",
"id": "8b5601ce",
"metadata": {
"papermill": {
"duration": 0.001954,
"end_time": "2022-11-16T14:39:07.773003",
"exception": false,
"start_time": "2022-11-16T14:39:07.771049",
"status": "completed"
},
"tags": []
},
"source": [
"## Python API\n",
"\n",
"The simplest way to get started with Ploomber is via the Spec API, which allows you to describe pipelines using a `pipeline.yaml` file, most examples on this repository use the Spec API. However, if you want more flexibility, you may write pipelines with Python.\n",
"\n",
"The [`templates/python-api/`](templates/python-api) directory contains a project written using the Python API. And the [`python-api-examples/`](python-api-examples) includes some tutorials and more examples."
]
},
{
"cell_type": "markdown",
"id": "093cf2ff",
"metadata": {
"papermill": {
"duration": 0.002256,
"end_time": "2022-11-16T14:39:07.777053",
"exception": false,
"start_time": "2022-11-16T14:39:07.774797",
"status": "completed"
},
"tags": []
},
"source": [
"## Micro-pipelines\n",
"\n",
"In Ploomber `0.21`, we introduced a simplified API to write pipelines in a single Jupyter notebook (or `.py`) file. This is a great option for small projects.\n",
"\n",
"You can find the examples in the [`micro-pipelines/`](micro-pipelines) directory."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
},
"papermill": {
"default_parameters": {},
"duration": 2.004917,
"end_time": "2022-11-16T14:39:08.010236",
"environment_variables": {},
"exception": null,
"input_path": "README.ipynb",
"output_path": "README.ipynb",
"parameters": {},
"start_time": "2022-11-16T14:39:06.005319",
"version": "2.4.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}